Model based self-optimization

Centre for Advanced Studies
Hierarchical Model-based Autonomic Control of
Software Systems
Marin Litoiu, IBM CAS Toronto
Murray Woodside, Tao Zheng, Carleton
University
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Outline
 Motivation
 Hierarchical Control
 Performance Models
 Conclusions
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
A Typical Deployment: Data Centres
SLAs
component
Application
Client
Client
Web
App Server
Data Server
Server
Data Centre
Client
Client
Web
App Server
Server
DEAS 05, St. Louis
May 21, 2005
Data Server
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Self - optimization
 Automated allocation of software
and hardware resources to
accomplish a performance goal by
optimizing a cost function in the
presence of
– Workload variations
Response
time
– Perturbations
– Change in the environment
 Aims at
time
– Reducing the cost of
ownership
– Improve the QoS
(dependability)
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Outline
 Motivation
 Hierarchical Control
 Performance Models
 Conclusions
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Hierarchical Control(1)
(Web) services
Autonomic system
Autonomic application
system
Load balanceController
Application
Manager
Provisioning Controller
S
L
O
TModel
CModel
Model Builder
Load Monitor
Functional
Unit
Management
Unit
Sensors
Model Builder
*
Effectors
S
L
O
LModel
AModel
Managed Component
Resource
Tuning Manager
Component
Controller
TController
CDecision
Model Builder
*
LController
ADecision
PModel
PController
PDecision
Goals(SLAs )
Autonomic component
Tuning
MonitorMonitor
Level 1: Component tuning
Level 2: Application tuning
Level 3: Provisioning
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Hierarchical Control(2)
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Hierarchical Control(3)

The Controller
– Monitor the managed component performance metrics
and the input workload and setpoints (SLOs)
– Use the performance model to estimate future metrics
and future adjustments of controlled parameters
– If the future workloads cannot be accommodated by
local adjustments, alert the upper level; otherwise
perform the local adjustments.
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Hierarchical Control(4): Advantages
“When in doubt, mumble; when in trouble,
delegate; when in charge, ponder.”
 The targeted systems are hierarchical
– Structure is hierarchical (containment)
– Goals are hierarchical
– Authority is hierarchical
 Provides several homeostatic control levers: If the 1st one
fails, engage the 2nd, if the 2nd one fails, engage the 3rd…
 Solves time scale and scope issues
 Reduces cognition, control and communication complexity
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Agenda
 Motivation
 Hierarchical control
 Performance Models
 Conclusions
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
The Role of Performance Models
 “All models are wrong, some models are useful.”
 Prediction (forecast) role: tell what happens in the future
– if the workload increases by 100 users, the response
time will increase by 5s
 Estimation role: what happens now
– If I increase the number of threads, servers, alter the
software architecture, what is the estimated change in
performance
 Problem determination role
– Where is the bottleneck
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Queuing Network Models
Client
Client
Web
App Server
Data Server
Server
Response time[ms]
Scalability of Auction Application
2500
2000
K
1500
Measured
Model
1000
R ( N )   Di[1  Qi ( N  1)]
i 1
Di=service demand;
500
X=N/(R)
0
1
25
50
75
100
X=throughput
Ui=X *Di
Number of Clients
DEAS 05, St. Louis
Ui=utilization of
device i
Qi=queue length at
device i
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Layered Queuing Models (LQM)


Layered Queuing Models (LQM) are analytic performance models that
–
Extend Queuing Network Models (QNMs)
–
Model queuing at software components: threading and data connection pools, locks and critical sections
–
Model multiple classes of requests
LQM Structure
–
Software resource interactions: synchronous, asynchronous, forward call
–
Demands at hardware resources for each class of request, one user per class in the system
–
Queuing centers: CPU, DISK, network, threading and data connections pools…
Client
Client
Web
App Server
Server
Client
CPU,
Disk
DEAS 05, St. Louis
CPU, Disk
CPU, Disk
May 21, 2005
Data Server
Layer 1
CPU, Disk
Layer 0
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Linearized Dynamic Models

Response time[ms]
Scalability of Auction Application
2500
2000

1500
Measured
Linear Model
1000
Linear regression models
x(k)=Ax(k-1)+Bu(k)
y(k)=Cx(k)+Du(k)
x,u, y - vectors
A,B,C,D- matrixes
Consider multiple input, output and state
variables

Take into account the tendencies

Advantage
– Take advantage of the controller design
techniques from system control
500
0
– Capture the transient behaviour
1
25
50
75
Number of Clients
100
Disadvantage
– Assume the system is linear
– The matrixes A,B,C,D are experimentally
identified
R(N)=C*N
DEAS 05, St. Louis
– Any change in the system will invalidate the
model
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Threshold and Policy Models

x
Decision is part of the Model
– Monitor an output variable (utilization, queue length..)
– Increase/decrease a state variable
– “ If the number of users increases by 10, increase the number
of threads by 5”
– “ if the utilization is greater than 50%, then provision a new
server”

More complex models
– Monitor more output variables
– Increase/decrease one output variable

Advantages
– Very simple
– Very fast

Disadvantages
– Not globally optimal, sometimes not even locally optimal
– Not able to handle changes in the system
add a unit to
the pool if <M
1
0
-1
y
remove a
unit from the
pool if >1
4
3
Output
Measure
y_2
2
x=1
Output Measure y_1
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Tradeoffs.…….
 Layered Queuing Models
–
–
–
–
Model non-linearities across wide domains
Appropriate at application and system level
Work with mean values
Difficult to obtain service times per individual transactions( Kalman filters seem to help)
 Dynamic Models
–
–
–
–
–
Model transitory behaviour
Appropriate at component level
For mean values, can be deduced from LQMs
In general, built experimentally (hard)
Enable System Control
 Threshold and Policy Models
–
–
–
–
–
Can capture rules of thumb and domain experience
Can be deduced from the queuing or dynamic models
Appropriate at any level, as a default model
Simple and fast
Hard to maintain…
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Conclusions
 Hierarchical control
– Solves the problem of timescale and scope
– Provide flexibility of choice of control algorithms
– Supports accelerated decision making by LQM at upper levels
 Self Optimization=Component tuning + Application Tuning (Load balancing) +
Provisioning
 Models for Self-optimization
– Threshold or Policy Based Models
– Linearized Dynamic Models
– Network or Layered Queuing Models
 Further Work:
– End to end evaluation
– Optimization
– Stability
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Backup slides
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Model Builder
Prediction:
Model
x: new
parameters
H:
sensitivities
System
y: predicted
performance
ŷ = prediction of observation
y based on and the model,
Feedback:
Monitor
Filter
e:
prediction
error
x, P: new parameters
and covariances
x̂ old = previous estimate of x
z:
measured
performance
z = new observation vector
x̂ new
=K(z -ŷ)
*
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Dynamic Models  link AC to System Control
Automatic Control
Steam
valve
B
1952: Bellman’s optimal
control (dynamic
programming)
1878: Maxwell's
“On Governors”
R
E
1789: Watt’s
fly-ball
governor
Experimental AC
1931:
Black&Nyquist’s
electronic negative
feedback amplifier
Classic AC
1960: Kalman fillter
Modern AC
Autonomic Computing
1970’s: Time Sharing OS
2001:AC
1978: Cerf et al. TCP/IP
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Solvers for LQMs
 Algorithms
– Layers of QNMs
– The output of a lower lever is used
as input for the upper level
– Continue until a fix-point is reached
Layer 2
 Public Solvers
– Method of Layers (Rolia, Sevcik)
• Based on Linearizer approximation
algorithm
– LQNS (Woodside )
Layer 1
• Based on approximate MVA
– APERA (Litoiu)
• Based on approximate MVA, two
layers, on alphaworks
Layer 0
(hardware)
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
LQMs- Predicting the application response time
10000
9000
8000
R(ms)
7000
N: 796 1 1 1 1
6000
N: 1 796 1 1 1
5000
N: 1 1 796 1 1
N: 1 1 1 796 1
4000
N: 1 1 1 1 796
3000
2000
1000
0
1
2
•Depending on the class mix, response time varies widely even when the
number of users is constant ( 800)
•Each class may reach its maximum response time for a different workload
mix
•Workload mixes produce changes in the bottlenecks
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
LQMs- Predicting the Threading Level in WAS*
•100 clients
Monitored number of threads with
HTTPConnection: Keep-Alive
Estimated average number of
threads
Monitored number of threads with
HTTPConnection:close
------------------------------* From Litoiu M., "Migrating to Web services: a performance engineering approach," Journal of Software
Maintenance and Evolution: Research and Practice, No 16, pp. 51-70, 2004.
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Dynamic Models  link AC to System Control
Automatic Control
Steam
valve
B
1952: Bellman’s optimal
control (dynamic
programming)
1878: Maxwell's
“On Governors”
R
E
1789: Watt’s
fly-ball
governor
Experimental AC
1931:
Black&Nyquist’s
electronic negative
feedback amplifier
Classic AC
1960: Kalman fillter
Modern AC
Autonomic Computing
1970’s: Time Sharing OS
2001:AC
1978: Cerf et al. TCP/IP
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Component tuning
Example of control parameters
Threading level
Application tuning
Admission control
setpoint
Scheduling
Controller
(Analyze, Optimize)
Session length
Live versus closed
connections
Specialized controllers
Specialized decision making
How general can you be at the
component level?
Monitor
(Model)
Execute
u (control)
Effector
(observation) y
Sensor
Component
(disturbance) z
How can one systematically
inject variability at this level?
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Application tuning
Load balancing
Provisioning
 horizontal scaling
setpoint
vertical scaling
Inter-component optimization
DB2 connection pool in
WAS
Component allocation
Scheduling
Controller and models
Controller
(Analyze, Optimize)
Monitor
(Model)
Execute
u (control)
Effector
(observation) y
Sensor
Application=Software Components
(disturbance) z
Become more complex
Become more general
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Provisioning
Hardware and
software provisioning
Add/remove
hardware
Add/remove
software
components
Needs long term
prediction
DEAS 05, St. Louis
SLA
Controller
(Analyze, Optimize)
Monitor
(Model)
Execute
u (control)
Effector
(observation) y
Sensor
Applications
(disturbance) z
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
Building Accurate Models
“If the map and the terrain don’t match,
trust the terrain."
Swiss Army Rule
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation
IBM Centre for Advanced Studies, Toronto
THANKS!
DEAS 05, St. Louis
May 21, 2005
© 2005 IBM Corporation