PowerPoint Template

Learning-Based Power Management
for Multi-Core Processors
YE Rong
Nov 15, 2011
CUHK
Background
 Dynamic Power Management (DPM)
 Power consumption has become a major issue in system design
 DPM is defined as the selective shutdown of inactive system
components
StrongARM SA1100
 Different power modes
• Different power consumption
• Different transition cost to wake up (deeper sleep, more transition cost)
CUHK
Background
 Dynamic Power Management (DPM)
 The key point in DPM is how to utilize the idle periods of system
components
• Difficult to choose the opportune moment to turn off inactive components
• Difficult to select proper sleep mode
Running mode
Sleep mode
idle mode
running mode
The idle period is unknown in general-purpose processors.
CUHK
Related Work
 Existing DPM policies
 Heuristic policies
• Timeout Policy
 Predict the length of idle period (e.g., regression)
• Input: past idle periods
• Output: current idle period
 Stochastic policies
• Markov decision process
• Semi-Markov decision process
CUHK
Motivation
 System and environment conditions are varying
 Learning technique can adjust system DPM policy to adapt to this
changing
 New feature to utilize
 We are provided a choice to re-allocate task to a certain core
running
sleep
running
sleep
 Learning-based DPM for Multi-Core Processors
 Model this problem as stochastic decision process
 Employ Q-learning algorithm to find out better policy
CUHK
Q-learning
 Basic idea of Q-learning
 Three components
• A discrete set of environment state, S  {st }
• A discrete set of agent actions, A  {a t }
• A reward function, R : S  A  R
 Map system states to actions to maximize the expected reward in the
future
CUHK
Reward Function
 Proposed reward function
 Weighted sum of power consumption and response time
R(st , at ,  )  P( st , at )    RT ( st , at )
R: reward; P: average power dissipation; RT: average response time
β: coefficient for the tradeoff between power and performance
CUHK
Neural Network
 Neural network to tackle new problem in multi-core case
 System state and action numbers are increased exponentially
• System state number (cn  qn) n
n
• Action number ( m  n)
• Solution space: the composition of system state space and action space
(cn  qn)n  (mn  n)
 For example,
•
•
•
•
Core with 3 power modes: running, idle, and sleep mode
Queue with 2 queue states: 0 for no task, otherwise 1
A multi-core processor with 8 cores
Then, the space size is
(3  2)8  (38  8)  88,159,684,608
 Use neural network to approximate Q-value, instead of Q table in
standard Q-learning
CUHK
Neural Network




Input layer: (st , at )
 (  s w   a w )
)
Hidden layer: H j  1/ (1  eh
Output layer: Q( st , at )   H j  u j
i 1
Parameters update
n
m
i
i 1
ij
i
ij
i 1
 Parameters are updated in a gradient way with the help of the backpropagation algorithm
CUHK
Experimental Setup
 Our experiments are based on synthetic workloads
 Homogeneous multi-core processors with 4 cores or 8 cores
 Processor parameters
• Exactly the same with StrongARM SA1100
 A set of tasks
• Task number: 10,000
• Task service time: hyper exponential distribution
• Task arrival time: exponential distribution
 DPM techniques for comparison
 Timeout policy with T = Tbe (TVLSI, 1996)
 Reinforcement learning-based policy
(ICCAD, 2009)
CUHK
Results
 Energy
CUHK
Results
 Response Time
CUHK
Conclusion
 In this work,
 We extend DPM problem to multi-core case
• Present a novel DPM model to utilize the distinctive features of multi-core
processors, to obtain global power benefits;
 We develop a learning-based algorithm
• Use neural network to solve the huge solution space problem that is different
from single-core case
• Employ ε-greedy to avoid local maximum of power optimization
 The experimental results show that
• Our proposed methodology can greatly reduce power dissipation with
reasonable performance penalty or even obtain benefits in both power and
performance
 Power is a critical issue
 Related to temperature, lifetime reliability, etc.
 This framework can be utilized to optimize reliability issues
CUHK
Thanks for Your Attention!
YE Rong
Nov 15, 2011
CUHK