Partitioning Strategies: Spatiotemporal Patterns of

The SEEC
Computational Model
Henry Hoffmann, Anant Agarwal
PEMWS-2
April 6, 2011
In the beginning…*
Application programmers had one goal:
Performance
*The beginning, in this case, refers to the beginning of my career (1999)
2
But Modern Systems Have Increased the
Burden on Application Programmers
Lo
Hi
Power
Power
Many additional
constraints
Performance
beat/s
Even worse, constraints can change dynamically
E.g. power cap, workload fluctuation, core failure
3
Most Execution Models Designed for
Performance
Coherent Shared Memory
Message Passing
Global Shared Cache
data
Local
Cache
Local
Memory
Local
Cache
data
data
data
store
load
RF
RF
Core 0
Core 1
Local
Memory
data
store
RF
Network
Interface
Network
load
Network
Interface
Core 0
RF
Core 1
Concurrency
Multi-threaded
Multi-process
Communication
Through Memory
Through Network
Coordination
Locks
Messages
Control
Procedural
Procedural
Procedural control insufficient to meet the needs of modern systems
5
Angstrom Replaces Procedural Control
with Self-Aware Control
Procedural Control
Self-Aware Control
Decide
Observe
Act
Decide
Act
• Run in open loop
• Run in closed loop
• Assumptions made at design time
• Based on guesses about future
• Understand user goals
• Monitor the environment
• Application optimized for system
• No flexibility to adapt to changes
• System optimizes for application
• Flexibly adapt behavior
The self-aware model allows the system
to solve constrained optimization problems dynamically
6
Outline
• Introduction/Motivating Example
• The SEEC Model and Implementation
• Experimental Validation
• Conclusions
7
Angstrom’s SElf-awarE Computing
(SEEC) Model
• Goal:
Reduce programmer burden by continuously optimizing online
• Key Features:
1. Applications explicitly state goals and progress
2. System software and hardware state available actions
3. The SEEC runtime system dynamically selects actions to maintain goals
Application
Developer
Observe
Act
API
SPI
Systems
Developer
Decide
SEEC
Runtime
A unified decision engine adapts applications, system software, and hardware
8
Example Self-Aware System
Built from SEEC
Observe
Video Encoder
20 b/s
30
33
Goals:
30 beat/s,
Minimize Power
Decide
Control and Learning System
Act
Actuators
Initial Model
8
7
_
r
Speedup
Heart Rate
9
pure delay
slow convergence
oscillating
6
5
4
3
2
1
0
5
Time
Desired
Heart
Rate
_
r
-
Error
e(k)
10
15
20
Action
SEEC
Controller
Speedup
25
30
Algorithm
Cores
Frequency
Bandwidth
Application
s(k)
Observed Heart Rate
r(k)
9
Roles in the SEEC Model
Application
Developer
Observe
Systems
Developer
Express application
goals and progress
(e.g. frames/ second)
Read goals and
performance
Determine how to adapt
(e.g. How much to
speed up the
application)
Decide
Act
SEEC
Runtime System
Provide a set of actions
and a callback function
(e.g. allocation of cores
to process)
Initiate actions based
on results of decision
phase
10
Registering Application Goals
Observe
Application
Performance
Quality
Lo
Hi
SEEC
Decision Engine
Power
Power
•
Performance
– Goals: target heart rate and/or latency between tagged heartbeats
– Progress: issue heartbeats at important intervals
•
Quality
– Goals: distortion (distance from application defined nominal value)
– Progress: distortion over last heartbeat
•
Power
– Goals: target heart rate / Watt and/or target energy between tagged heartbeats
– Progress: Power/energy over last heartbeat interval
Research to date focuses on meeting performance while minimizing power/maximizing quality
11
Registering System Actions
Act
Actuators
Estimated Speedup
SEEC
Decision Engine
Cost
Algorithm
Cores
Frequency
Bandwidth
Callback
Each action has the following attributes:
• Estimated Speedup
– Predicted benefit of taking an action
• Cost
– Predicted downside of taking an action (increased power, lowered quality)
• Callback
– A function that takes an id an implements the associated action
12
Picking Actions to
Meet Goals
Application
H
i
System
Services
SEEC
Decision Engine
Performance
Goal
L
o
Decide
-
Controller
(Decide)
Actuator
(Act)
Application
(Observe)
Current
Performance
Power
Decisions are made to select actions given observations:
• Read application goals and heartbeats
• Determine speedup with control system
• Translate speedup into one or more actions
The control system provides predictable and analyzable behavior
13
Optimizing Resource
Allocation with SEEC
• SEEC can observe, decide and act
• How does this enable optimal resource allocation?
• Let’s implement the video encoder example from the
introduction
14
Performance/Watt Adaptation
in Video Encoding
Time (s)
0
2
4
6
8
10
12
14
16
80
180
Performance
Power
60
170
50
160
40
30
150
Power (W)
Performance (Frame/s)
70
Performance goal
20
140
10
0
130
50
150
250
Time (Heartbeat)
350
450
15
System Models in SEEC
Initial Model
9
8
Speedup
7
6
5
4
3
2
1
0
5
10
15
20
25
30
Action
SEEC’s control system takes actions based on models (of
speedup and cost per action) associated with actions
What if the models are inaccurate?
16
Updating Models in SEEC
Application
Adaptation Level 2
SEEC Decision Engine
Adaptation Level 1
System
Model
(Decide)
Decide
System
Services
Application
Model
(Decide)
Adaptation Level 0
L
o
H
i
Power
Performance
Goal
-
Controller
(Decide)
Actuator
(Act)
Application
(Observe)
Current
Performance
• After every action, SEEC updates application and system models
• Kalman filters used to estimate true state of application and system
– We are exploring a number of different alternatives here
SEEC combines predictability of control systems with adaptability of
learning systems
17
SEEC Online Learning of Speedup Model
for Application with Local Minima
Initial Model
Actual Speedups
Learned Speedups
9
8
Speedup
7
6
5
4
3
2
1
0
5
10
15
20
25
30
Action
18
Handling Multiple Applications
Application
Application
Application
Application
Adaptation Level 2
SEEC Decision Engine
Adaptation Level 1
L
o
System
Services
Application
Model
(Decide)
H
i
Power
L
o
System
Model
(Decide)
Adaptation Level 0
H
i
L
o
Decide
H
i
Power
L
o
H
i
Power
Performance
Goal
-
Controller
(Decide)
Actuator
(Act)
Application
(Observe)
Power
Current
Performance
• Control actions computed separately for each application
• For finite resources, several alternatives:
• Priorities (for real-time, admission control)
• Weighting/Centroid method (for overall system throughput)
19
Outline
• Introduction/Motivating Example
• The SEEC Model and Implementation
• Experimental Validation
• Conclusions
20
Systems Built with SEEC
System
Actions
Tradeoff
Benchmarks
Dynamic Loop
Perforation
Skip some loop iterations
Performance vs.
Quality
7/13 PARSECs
Dynamic Knobs
Make static parameters
dynamic
Performance vs.
Quality
bodytrack, swaptions,
x264, SWISH++
Core Scheduler
Assign N cores to
application
Compute vs. Power
11/13 PARSECs
Clock Scaler
Change processor speed
Compute vs. Power
11/13 PARSECs
Bandwidth Allocator
Assign memory controllers
to application
Memory vs. Power
STREAM
(doesn’t make a
difference for PARSEC)
Power Manager
Combination of the three
above
Performance vs.
Power
PARSEC, STREAM,
simple test apps
(mergesort, binary search)
Learned Models
Power Manager with
speedup and cost learned
online
Performance vs.
Power
PARSECs
Multi-App Control
Power Manager with
multiple applications
Performance vs.
Power and Quality for
multiple applications
Combinations of
PARSECs
21
Systems Built with SEEC
System
Actions
Tradeoff
Benchmarks
Dynamic Loop
Perforation
Skip some loop iterations
Performance vs.
Quality
7/13 PARSECs
Dynamic Knobs
Make static parameters
dynamic
Performance vs.
Quality
bodytrack, swaptions,
x264, SWISH++
Core Scheduler
Assign N cores to
application
Compute vs. Power
11/13 PARSECs
Clock Scaler
Change processor speed
Compute vs. Power
11/13 PARSECs
Bandwidth Allocator
Assign memory controllers
to application
Memory vs. Power
STREAM
(doesn’t make a
difference for PARSEC)
Power Manager
Combination of the three
above
Performance vs.
Power
PARSEC, STREAM,
extra test apps
(mergesort, binary search)
Learned Models
Power Manager with
speedup and cost
learned online
Performance vs.
Power
PARSECs
Multi-App Control
Power Manager with
multiple applications
Performance vs.
Power and Quality
for multiple
applications
Combinations of
PARSECs
22
Dynamic Knobs:
Creating Adaptive Applications
Turn static command line parameters into dynamic structure
Application Goals
Maintain performance and minimize quality loss
System Actions
Adjust memory locations to change application
settings
Benchmarks: bodytrack, swaptions, SWISH++, x264
Experiment
Maintain performance when clock speed changes
Detail in Hoffmann et al. “Dynamic Knobs for Power Aware Computing” ASPLOS 2011
23
Results
Enabling Dynamic Applications
bodytrack
Baseline
Power Cap
Dynamic Knobs
Normalized Performance
1.4
Clock rises
1.6-2.4 GHz
1.2
1
0.8
0.6
Clock drops
2.4-1.6GHz
0.4
0.2
w/ SEEC perf.
recovers
w/o SEEC
perf. drops
0
0
50
100
150
Time
200
SEEC returns
quality to
baseline
250
24
Results
Enabling Dynamic Applications
SWISH++
Baseline
Power Cap
swaptions
Dynamic Knobs
Baseline
Baseline
Dynamic Knobs
1
0.8
0.6
0.4
0.2
0
0
200
400
600
800
1000
Time
Maintains performance
despite noise
Normalized Performance
1.2
Power Cap
Dynamic Knobs
1.4
1.4
Normalized Performance
Normalized Performance
1.4
Power Cap
x264
1.2
1
0.8
0.6
0.4
0.2
1.2
1
0.8
0.6
0.4
0.2
0
0
0
100
200
300
400
Time
Perfect behavior
500
0
100
200
300
400
500
600
Time
Maintains baseline
performance
Dynamic knobs automatically enable dynamic response for a
range of applications using a single mechanism
25
Optimizing Performance per
Watt for Video Encoding
Adapt system behavior to needs of individual inputs
Application Goals
System Actions
Maintain 30 frame/s while minimizing power
Change cores, clock speed, and memory
bandwidth
Benchmark: x264 w/ 16 different 1080p inputs
Experiment
Compare performance/Watt w/ SEEC to best static
allocation of resources for each input
26
Average Performance per Watt for
Video Encode
Normalized Performance/Watt
static worst
static oracle
AL 0
AL 1
AL 2
1.2
1
0.8
0.6
0.4
0.2
0
Input
27
Normalized Performance/Watt
Performance per Watt for All Inputs
1.2
static worst
static oracle
AL 0
AL 1
AL 2
1
0.8
0.6
0.4
0.2
0
Input
28
Learning Models Online
Adapt system behavior when initial models are wrong
Four targets: Min, Median, Max, Max+
Application Goals
Minimize power consumption for each target
Change cores, clock speed, and mem. bandwidth
System Actions
Initial models are incredibly optimistic
(Assume linear speedup with any resource increase)
Benchmark: STREAM
Experiment
Converge to within 5% of target performance,
measure power and convergence time
29
Complex Response to Resources
2.4 GHz, 2 MC
2.4 GHz, 1 MC
2.6
Can 1 MC
1.6 Ghz, Adding
2 MC Cores
1.6 GHz,
Slow the App Down
Adding Memory
Bandwidth Can Slow
Down
2.4
2.2
Speedup
2
1.8
1.6
1.4
1.2
1
0.8
1
2
3
4
5
6
7
8
Cores
30
Power Savings
Oracle
AL 0
AL 1
AL 2
Total System Power (Watts)
250
AL 2 can provide
30 Watt power
savings over AL1
200
150
100
50
0
Min
Median
Max
Performance Target
Max+
31
Convergence Time
Oracle
AL 0
AL 1
AL 2
Convergence Time (quanta)
60
50
40
30
20
10
0
Min
Median
Max
Performance Target
Max+
32
Managing Application and System
Resources Concurrently
Manage multiple applications when clock frequency changes
bodytrack: maintain performance, minimize power
Application Goals
x264: maintain performance, minimize quality loss
Change core allocation to both applications
System Actions
Experiment
Change x264’s algorithms
Maintain performance of both applications when
clock frequency changes
33
Results
SEEC Management of Multiple Applications
x264
SEEC allocates
cores to bodytrack
2.5
7
SEEC removes cores
7
from x264
2.5
x264 w/ adaptation
x264
x264 cores
5
1.5
4
3
1
2
bodytrack w/ adaptation
bodytrack
0.5
Normalized Performance
2
Cores
Normalized Performance
6
2
6
5
1.5
4
3
1
2
0.5
1
1
bodytrack cores
0
0
40
90
Clock drops
2.4-1.6GHz
140
190
240
0
0
40
Time (Heartbeat)
w/o SEEC app
misses goals
90
140
190
240
Time (Heartbeat)
w/o SEEC app
exceeds goals
SEEC adjusts algorithm
to meet goals
34
Cores
bodytrack
Summary of Case Studies
Experiment
Demonstrated Benefit of SEEC
Dynamic Knobs
Maintains application performance in the face of loss of
compute resources
Performance/Watt
Out-performs oracle for static allocation of resources
by adapting to fluctuations in input data
Performance/Watt with
learning
Learns models online starting from initial model which
is ~10x off true values
Multi-App control
Maintains performance of multiple apps by managing
algorithm and system resources to adapt to loss of
compute resources
35
Outline
• Introduction/Motivating Example
• The SEEC Framework
• Experimental Validation
• Conclusions
36
Angstrom’s Hardware
Support for Self-Awareness
• “If you want to change something, measure it.”
• We need sensors to measure everything we want to
manage:
–
–
–
–
–
Performance (there is a rich body of work in this area)
Power
Quality
Resiliency
…
37
Need Performance-level
Support for Power Analysis
1. gprof -> pprof?
index % time
self children
called name
Power??
<spontaneous>
----------------------------------------------0.00 0.05
1/1
main [2]
[1] 100.0 0.00 0.05
1
report [3]
0.00 0.03
8/8
timelocal [6]
0.00 0.01
1/1
print [9]
0.00 0.01
9/9
fgets [12]
0.00 0.00
12/34
strncmp <cycle 1> [40]
0.00 0.00
8/8
lookup [20]
0.00 0.00
1/1
fopen [21]
0.00 0.00
8/8
chewtime [24]
0.00 0.00
8/16
skipspace [44]
----------------------------------------------[2] 59.8 0.01
0.02
8+472 <cycle 2 as a whole> [4]
0.01
0.02 244+260
offtime <cycle 2> [7]
0.00
0.00 236+1
tzset <cycle 2> [26]
-----------------------------------------------
Energy???
38
Angstrom’s Hardware
Support for SEEC
Low-power core for SEEC runtime system
39
Angstrom’s System Software and
Hardware Support for SEEC
Actuators
Algorithm
Cores
Cache
Network Bandwidth
TLB Size
Frequency
Memory Bandwidth
Registers
ALU Width
…
Enable tradeoffs wherever possible
Support fine-grain allocation of resources
40
Conclusions
• SEEC is designed to help ease programmer burden
– Solves resource allocation problems
– Adapts to fluctuations in environment
• SEEC has three distinguishing features
– Incorporates goals and feedback directly from the application
– Allows independent specification of adaptation
– Uses an adaptive second order control system to manage adaptation
• Demonstrated the benefits of SEEC in several experiments
– SEEC can optimize performance per Watt for video encoding
– SEEC can adapt algorithms and resource allocation to meet goals in the
face of power caps or other changes in environment
• SEEC navigates tradeoff spaces
– So build more tradeoffs into apps, system software, system hardware, etc.
41
SEEC References
• Application Heartbeats Interface:
–
Hoffmann, Eastep, Santambrogio, Miller, Agarwal. Application Heartbeats: A Generic
Interface for Specifying Program Performance and Goals in Autonomous Computing
Environments. ICAC 2010
• Controlling Heartbeat-enabled Systems:
–
–
Maggio, Hoffmann, Santambrogio, Agarwal, Leva. Controlling software applications within
the Heartbeats framework. CDC 2010
Maggio, Hoffmann, Agarwal, Leva. Control theoretic allocation of CPU resources. FeBID
2011
• Creating Adaptive Applications:
–
–
Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve
Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209042. 2009
Hoffmann, Sidiroglou, Carbin, Misailovic, Agarwal, Rinard. Dynamic Knobs for Power-Aware
Computing. ASPLOS 2011.
• The SEEC Framework:
–
–
Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware
Management of Multicore Resources. MIT-CSAIL-TR-2011-016 2011.
Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware
Computing. MIT-CSAIL-TR-2010-049 2010.
42