Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011 CUHK Background Dynamic Power Management (DPM) Power consumption has become a major issue in system design DPM is defined as the selective shutdown of inactive system components StrongARM SA1100 Different power modes • Different power consumption • Different transition cost to wake up (deeper sleep, more transition cost) CUHK Background Dynamic Power Management (DPM) The key point in DPM is how to utilize the idle periods of system components • Difficult to choose the opportune moment to turn off inactive components • Difficult to select proper sleep mode Running mode Sleep mode idle mode running mode The idle period is unknown in general-purpose processors. CUHK Related Work Existing DPM policies Heuristic policies • Timeout Policy Predict the length of idle period (e.g., regression) • Input: past idle periods • Output: current idle period Stochastic policies • Markov decision process • Semi-Markov decision process CUHK Motivation System and environment conditions are varying Learning technique can adjust system DPM policy to adapt to this changing New feature to utilize We are provided a choice to re-allocate task to a certain core running sleep running sleep Learning-based DPM for Multi-Core Processors Model this problem as stochastic decision process Employ Q-learning algorithm to find out better policy CUHK Q-learning Basic idea of Q-learning Three components • A discrete set of environment state, S {st } • A discrete set of agent actions, A {a t } • A reward function, R : S A R Map system states to actions to maximize the expected reward in the future CUHK Reward Function Proposed reward function Weighted sum of power consumption and response time R(st , at , ) P( st , at ) RT ( st , at ) R: reward; P: average power dissipation; RT: average response time β: coefficient for the tradeoff between power and performance CUHK Neural Network Neural network to tackle new problem in multi-core case System state and action numbers are increased exponentially • System state number (cn qn) n n • Action number ( m n) • Solution space: the composition of system state space and action space (cn qn)n (mn n) For example, • • • • Core with 3 power modes: running, idle, and sleep mode Queue with 2 queue states: 0 for no task, otherwise 1 A multi-core processor with 8 cores Then, the space size is (3 2)8 (38 8) 88,159,684,608 Use neural network to approximate Q-value, instead of Q table in standard Q-learning CUHK Neural Network Input layer: (st , at ) ( s w a w ) ) Hidden layer: H j 1/ (1 eh Output layer: Q( st , at ) H j u j i 1 Parameters update n m i i 1 ij i ij i 1 Parameters are updated in a gradient way with the help of the backpropagation algorithm CUHK Experimental Setup Our experiments are based on synthetic workloads Homogeneous multi-core processors with 4 cores or 8 cores Processor parameters • Exactly the same with StrongARM SA1100 A set of tasks • Task number: 10,000 • Task service time: hyper exponential distribution • Task arrival time: exponential distribution DPM techniques for comparison Timeout policy with T = Tbe (TVLSI, 1996) Reinforcement learning-based policy (ICCAD, 2009) CUHK Results Energy CUHK Results Response Time CUHK Conclusion In this work, We extend DPM problem to multi-core case • Present a novel DPM model to utilize the distinctive features of multi-core processors, to obtain global power benefits; We develop a learning-based algorithm • Use neural network to solve the huge solution space problem that is different from single-core case • Employ ε-greedy to avoid local maximum of power optimization The experimental results show that • Our proposed methodology can greatly reduce power dissipation with reasonable performance penalty or even obtain benefits in both power and performance Power is a critical issue Related to temperature, lifetime reliability, etc. This framework can be utilized to optimize reliability issues CUHK Thanks for Your Attention! YE Rong Nov 15, 2011 CUHK
© Copyright 2024 Paperzz