The SEEC Computational Model Henry Hoffmann, Anant Agarwal PEMWS-2 April 6, 2011 In the beginning…* Application programmers had one goal: Performance *The beginning, in this case, refers to the beginning of my career (1999) 2 But Modern Systems Have Increased the Burden on Application Programmers Lo Hi Power Power Many additional constraints Performance beat/s Even worse, constraints can change dynamically E.g. power cap, workload fluctuation, core failure 3 Most Execution Models Designed for Performance Coherent Shared Memory Message Passing Global Shared Cache data Local Cache Local Memory Local Cache data data data store load RF RF Core 0 Core 1 Local Memory data store RF Network Interface Network load Network Interface Core 0 RF Core 1 Concurrency Multi-threaded Multi-process Communication Through Memory Through Network Coordination Locks Messages Control Procedural Procedural Procedural control insufficient to meet the needs of modern systems 5 Angstrom Replaces Procedural Control with Self-Aware Control Procedural Control Self-Aware Control Decide Observe Act Decide Act • Run in open loop • Run in closed loop • Assumptions made at design time • Based on guesses about future • Understand user goals • Monitor the environment • Application optimized for system • No flexibility to adapt to changes • System optimizes for application • Flexibly adapt behavior The self-aware model allows the system to solve constrained optimization problems dynamically 6 Outline • Introduction/Motivating Example • The SEEC Model and Implementation • Experimental Validation • Conclusions 7 Angstrom’s SElf-awarE Computing (SEEC) Model • Goal: Reduce programmer burden by continuously optimizing online • Key Features: 1. Applications explicitly state goals and progress 2. System software and hardware state available actions 3. The SEEC runtime system dynamically selects actions to maintain goals Application Developer Observe Act API SPI Systems Developer Decide SEEC Runtime A unified decision engine adapts applications, system software, and hardware 8 Example Self-Aware System Built from SEEC Observe Video Encoder 20 b/s 30 33 Goals: 30 beat/s, Minimize Power Decide Control and Learning System Act Actuators Initial Model 8 7 _ r Speedup Heart Rate 9 pure delay slow convergence oscillating 6 5 4 3 2 1 0 5 Time Desired Heart Rate _ r - Error e(k) 10 15 20 Action SEEC Controller Speedup 25 30 Algorithm Cores Frequency Bandwidth Application s(k) Observed Heart Rate r(k) 9 Roles in the SEEC Model Application Developer Observe Systems Developer Express application goals and progress (e.g. frames/ second) Read goals and performance Determine how to adapt (e.g. How much to speed up the application) Decide Act SEEC Runtime System Provide a set of actions and a callback function (e.g. allocation of cores to process) Initiate actions based on results of decision phase 10 Registering Application Goals Observe Application Performance Quality Lo Hi SEEC Decision Engine Power Power • Performance – Goals: target heart rate and/or latency between tagged heartbeats – Progress: issue heartbeats at important intervals • Quality – Goals: distortion (distance from application defined nominal value) – Progress: distortion over last heartbeat • Power – Goals: target heart rate / Watt and/or target energy between tagged heartbeats – Progress: Power/energy over last heartbeat interval Research to date focuses on meeting performance while minimizing power/maximizing quality 11 Registering System Actions Act Actuators Estimated Speedup SEEC Decision Engine Cost Algorithm Cores Frequency Bandwidth Callback Each action has the following attributes: • Estimated Speedup – Predicted benefit of taking an action • Cost – Predicted downside of taking an action (increased power, lowered quality) • Callback – A function that takes an id an implements the associated action 12 Picking Actions to Meet Goals Application H i System Services SEEC Decision Engine Performance Goal L o Decide - Controller (Decide) Actuator (Act) Application (Observe) Current Performance Power Decisions are made to select actions given observations: • Read application goals and heartbeats • Determine speedup with control system • Translate speedup into one or more actions The control system provides predictable and analyzable behavior 13 Optimizing Resource Allocation with SEEC • SEEC can observe, decide and act • How does this enable optimal resource allocation? • Let’s implement the video encoder example from the introduction 14 Performance/Watt Adaptation in Video Encoding Time (s) 0 2 4 6 8 10 12 14 16 80 180 Performance Power 60 170 50 160 40 30 150 Power (W) Performance (Frame/s) 70 Performance goal 20 140 10 0 130 50 150 250 Time (Heartbeat) 350 450 15 System Models in SEEC Initial Model 9 8 Speedup 7 6 5 4 3 2 1 0 5 10 15 20 25 30 Action SEEC’s control system takes actions based on models (of speedup and cost per action) associated with actions What if the models are inaccurate? 16 Updating Models in SEEC Application Adaptation Level 2 SEEC Decision Engine Adaptation Level 1 System Model (Decide) Decide System Services Application Model (Decide) Adaptation Level 0 L o H i Power Performance Goal - Controller (Decide) Actuator (Act) Application (Observe) Current Performance • After every action, SEEC updates application and system models • Kalman filters used to estimate true state of application and system – We are exploring a number of different alternatives here SEEC combines predictability of control systems with adaptability of learning systems 17 SEEC Online Learning of Speedup Model for Application with Local Minima Initial Model Actual Speedups Learned Speedups 9 8 Speedup 7 6 5 4 3 2 1 0 5 10 15 20 25 30 Action 18 Handling Multiple Applications Application Application Application Application Adaptation Level 2 SEEC Decision Engine Adaptation Level 1 L o System Services Application Model (Decide) H i Power L o System Model (Decide) Adaptation Level 0 H i L o Decide H i Power L o H i Power Performance Goal - Controller (Decide) Actuator (Act) Application (Observe) Power Current Performance • Control actions computed separately for each application • For finite resources, several alternatives: • Priorities (for real-time, admission control) • Weighting/Centroid method (for overall system throughput) 19 Outline • Introduction/Motivating Example • The SEEC Model and Implementation • Experimental Validation • Conclusions 20 Systems Built with SEEC System Actions Tradeoff Benchmarks Dynamic Loop Perforation Skip some loop iterations Performance vs. Quality 7/13 PARSECs Dynamic Knobs Make static parameters dynamic Performance vs. Quality bodytrack, swaptions, x264, SWISH++ Core Scheduler Assign N cores to application Compute vs. Power 11/13 PARSECs Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs Bandwidth Allocator Assign memory controllers to application Memory vs. Power STREAM (doesn’t make a difference for PARSEC) Power Manager Combination of the three above Performance vs. Power PARSEC, STREAM, simple test apps (mergesort, binary search) Learned Models Power Manager with speedup and cost learned online Performance vs. Power PARSECs Multi-App Control Power Manager with multiple applications Performance vs. Power and Quality for multiple applications Combinations of PARSECs 21 Systems Built with SEEC System Actions Tradeoff Benchmarks Dynamic Loop Perforation Skip some loop iterations Performance vs. Quality 7/13 PARSECs Dynamic Knobs Make static parameters dynamic Performance vs. Quality bodytrack, swaptions, x264, SWISH++ Core Scheduler Assign N cores to application Compute vs. Power 11/13 PARSECs Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs Bandwidth Allocator Assign memory controllers to application Memory vs. Power STREAM (doesn’t make a difference for PARSEC) Power Manager Combination of the three above Performance vs. Power PARSEC, STREAM, extra test apps (mergesort, binary search) Learned Models Power Manager with speedup and cost learned online Performance vs. Power PARSECs Multi-App Control Power Manager with multiple applications Performance vs. Power and Quality for multiple applications Combinations of PARSECs 22 Dynamic Knobs: Creating Adaptive Applications Turn static command line parameters into dynamic structure Application Goals Maintain performance and minimize quality loss System Actions Adjust memory locations to change application settings Benchmarks: bodytrack, swaptions, SWISH++, x264 Experiment Maintain performance when clock speed changes Detail in Hoffmann et al. “Dynamic Knobs for Power Aware Computing” ASPLOS 2011 23 Results Enabling Dynamic Applications bodytrack Baseline Power Cap Dynamic Knobs Normalized Performance 1.4 Clock rises 1.6-2.4 GHz 1.2 1 0.8 0.6 Clock drops 2.4-1.6GHz 0.4 0.2 w/ SEEC perf. recovers w/o SEEC perf. drops 0 0 50 100 150 Time 200 SEEC returns quality to baseline 250 24 Results Enabling Dynamic Applications SWISH++ Baseline Power Cap swaptions Dynamic Knobs Baseline Baseline Dynamic Knobs 1 0.8 0.6 0.4 0.2 0 0 200 400 600 800 1000 Time Maintains performance despite noise Normalized Performance 1.2 Power Cap Dynamic Knobs 1.4 1.4 Normalized Performance Normalized Performance 1.4 Power Cap x264 1.2 1 0.8 0.6 0.4 0.2 1.2 1 0.8 0.6 0.4 0.2 0 0 0 100 200 300 400 Time Perfect behavior 500 0 100 200 300 400 500 600 Time Maintains baseline performance Dynamic knobs automatically enable dynamic response for a range of applications using a single mechanism 25 Optimizing Performance per Watt for Video Encoding Adapt system behavior to needs of individual inputs Application Goals System Actions Maintain 30 frame/s while minimizing power Change cores, clock speed, and memory bandwidth Benchmark: x264 w/ 16 different 1080p inputs Experiment Compare performance/Watt w/ SEEC to best static allocation of resources for each input 26 Average Performance per Watt for Video Encode Normalized Performance/Watt static worst static oracle AL 0 AL 1 AL 2 1.2 1 0.8 0.6 0.4 0.2 0 Input 27 Normalized Performance/Watt Performance per Watt for All Inputs 1.2 static worst static oracle AL 0 AL 1 AL 2 1 0.8 0.6 0.4 0.2 0 Input 28 Learning Models Online Adapt system behavior when initial models are wrong Four targets: Min, Median, Max, Max+ Application Goals Minimize power consumption for each target Change cores, clock speed, and mem. bandwidth System Actions Initial models are incredibly optimistic (Assume linear speedup with any resource increase) Benchmark: STREAM Experiment Converge to within 5% of target performance, measure power and convergence time 29 Complex Response to Resources 2.4 GHz, 2 MC 2.4 GHz, 1 MC 2.6 Can 1 MC 1.6 Ghz, Adding 2 MC Cores 1.6 GHz, Slow the App Down Adding Memory Bandwidth Can Slow Down 2.4 2.2 Speedup 2 1.8 1.6 1.4 1.2 1 0.8 1 2 3 4 5 6 7 8 Cores 30 Power Savings Oracle AL 0 AL 1 AL 2 Total System Power (Watts) 250 AL 2 can provide 30 Watt power savings over AL1 200 150 100 50 0 Min Median Max Performance Target Max+ 31 Convergence Time Oracle AL 0 AL 1 AL 2 Convergence Time (quanta) 60 50 40 30 20 10 0 Min Median Max Performance Target Max+ 32 Managing Application and System Resources Concurrently Manage multiple applications when clock frequency changes bodytrack: maintain performance, minimize power Application Goals x264: maintain performance, minimize quality loss Change core allocation to both applications System Actions Experiment Change x264’s algorithms Maintain performance of both applications when clock frequency changes 33 Results SEEC Management of Multiple Applications x264 SEEC allocates cores to bodytrack 2.5 7 SEEC removes cores 7 from x264 2.5 x264 w/ adaptation x264 x264 cores 5 1.5 4 3 1 2 bodytrack w/ adaptation bodytrack 0.5 Normalized Performance 2 Cores Normalized Performance 6 2 6 5 1.5 4 3 1 2 0.5 1 1 bodytrack cores 0 0 40 90 Clock drops 2.4-1.6GHz 140 190 240 0 0 40 Time (Heartbeat) w/o SEEC app misses goals 90 140 190 240 Time (Heartbeat) w/o SEEC app exceeds goals SEEC adjusts algorithm to meet goals 34 Cores bodytrack Summary of Case Studies Experiment Demonstrated Benefit of SEEC Dynamic Knobs Maintains application performance in the face of loss of compute resources Performance/Watt Out-performs oracle for static allocation of resources by adapting to fluctuations in input data Performance/Watt with learning Learns models online starting from initial model which is ~10x off true values Multi-App control Maintains performance of multiple apps by managing algorithm and system resources to adapt to loss of compute resources 35 Outline • Introduction/Motivating Example • The SEEC Framework • Experimental Validation • Conclusions 36 Angstrom’s Hardware Support for Self-Awareness • “If you want to change something, measure it.” • We need sensors to measure everything we want to manage: – – – – – Performance (there is a rich body of work in this area) Power Quality Resiliency … 37 Need Performance-level Support for Power Analysis 1. gprof -> pprof? index % time self children called name Power?? <spontaneous> ----------------------------------------------0.00 0.05 1/1 main [2] [1] 100.0 0.00 0.05 1 report [3] 0.00 0.03 8/8 timelocal [6] 0.00 0.01 1/1 print [9] 0.00 0.01 9/9 fgets [12] 0.00 0.00 12/34 strncmp <cycle 1> [40] 0.00 0.00 8/8 lookup [20] 0.00 0.00 1/1 fopen [21] 0.00 0.00 8/8 chewtime [24] 0.00 0.00 8/16 skipspace [44] ----------------------------------------------[2] 59.8 0.01 0.02 8+472 <cycle 2 as a whole> [4] 0.01 0.02 244+260 offtime <cycle 2> [7] 0.00 0.00 236+1 tzset <cycle 2> [26] ----------------------------------------------- Energy??? 38 Angstrom’s Hardware Support for SEEC Low-power core for SEEC runtime system 39 Angstrom’s System Software and Hardware Support for SEEC Actuators Algorithm Cores Cache Network Bandwidth TLB Size Frequency Memory Bandwidth Registers ALU Width … Enable tradeoffs wherever possible Support fine-grain allocation of resources 40 Conclusions • SEEC is designed to help ease programmer burden – Solves resource allocation problems – Adapts to fluctuations in environment • SEEC has three distinguishing features – Incorporates goals and feedback directly from the application – Allows independent specification of adaptation – Uses an adaptive second order control system to manage adaptation • Demonstrated the benefits of SEEC in several experiments – SEEC can optimize performance per Watt for video encoding – SEEC can adapt algorithms and resource allocation to meet goals in the face of power caps or other changes in environment • SEEC navigates tradeoff spaces – So build more tradeoffs into apps, system software, system hardware, etc. 41 SEEC References • Application Heartbeats Interface: – Hoffmann, Eastep, Santambrogio, Miller, Agarwal. Application Heartbeats: A Generic Interface for Specifying Program Performance and Goals in Autonomous Computing Environments. ICAC 2010 • Controlling Heartbeat-enabled Systems: – – Maggio, Hoffmann, Santambrogio, Agarwal, Leva. Controlling software applications within the Heartbeats framework. CDC 2010 Maggio, Hoffmann, Agarwal, Leva. Control theoretic allocation of CPU resources. FeBID 2011 • Creating Adaptive Applications: – – Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209042. 2009 Hoffmann, Sidiroglou, Carbin, Misailovic, Agarwal, Rinard. Dynamic Knobs for Power-Aware Computing. ASPLOS 2011. • The SEEC Framework: – – Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware Management of Multicore Resources. MIT-CSAIL-TR-2011-016 2011. Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware Computing. MIT-CSAIL-TR-2010-049 2010. 42
© Copyright 2026 Paperzz