Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/grace [email protected] Introduction Wireless multimedia increasingly common Recent advances reduce constraints: 2GHz+ processors High-speed wireless networks Systems now Energy limited Energy management essential Adaptation Adaptation key to energy management Hardware adaptation already common Software adaptation also possible Challenges How do we control adaptations? How do we coordinate different adaptations? GRACE Project Target mobile multimedia devices. Coordinated adaptation of all system layers Hardware, application, network, OS Complete cross-layer adaptation framework Preserves separation between layers Goals of this work Target wireless video transmission Adapt application: Adaptive video encoder Adapt hardware: Adaptive CPU Implement part of GRACE framework Trade off between CPU and network energy Contributions Apply existing adaptive-CPU research Energy-adaptive video encoder Trades off between network, CPU Allows adaptation with fixed QoS Cross-layer adaptation framework Coordinate app and CPU adaptation Preserves logical separation between layers 20% Energy savings over existing systems Presentation Overview System model System architecture and design Cross-layer adaptation process Results System Model Adaptive CPU Adaptive Video Encoder •Video Capture Wireless Network Control Total Energy = CPU Energy + Network Energy CPU Hardware Adaptation Reduce performance to save energy Voltage and frequency scaling Lower freq lower voltage lower energy Architecture adaptation Issue width Active functional units (ALUs, etc.) Instruction window size [Micro] Adaptive Encoder Based on TMN H.263 encoder Changed to logarithmic motion search Encoder adapts for energy Trade off between network and CPU energy More computation fewer bits Adapt Motion Search and DCT Computationally expensive Elimination affects primarily rate Adaptive Encoder Details Motion Search and DCT thresholds Terminate MS early when SAD under threshold Skip DCT if SAD of block under threshold Transmit “DCT flag” bit for each 8x8 block Extends H.263 standard Adaptation effect: Setting thresholds at infinity Reduces CPU load by ~50% Increases data rate by 2x or more Adaptation Control When do we adapt? What configurations do we choose? Adaptation Control When do we adapt? Adapt before every frame What configurations do we choose? Adaptation Control When do we adapt? Adapt before every frame What configurations do we choose? Must minimize total CPU+network energy Must complete frame within its allocated time Adaptation Control When do we adapt? Adapt before every frame What configurations do we choose? Must minimize total CPU+network energy Must complete frame within its allocated time How do we find the optimal configurations? Optimization Application, CPU reconfiguration linked Application reconfiguration changes workload CPU reconfiguration changes performance App config affects optimal CPU configuration … and vice versa Two stage approach 1. For each app config, find CPU config, energy 2. Pick lowest-energy application configuration Optimization Algorithm 1. For each app config, find Best CPU config CPU energy Network energy Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Optimization Algorithm 1. For each app config, find Best CPU config – completes in time, with least energy [MICRO’01] CPU energy Network energy Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Optimization Algorithm 1. For each app config, find Best CPU config Requires instruction count – completes in time, with least energy [MICRO’01] CPU energy Network energy Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Optimization Algorithm 1. For each app config, find Best CPU config Requires instruction count – completes in time, with least energy [MICRO’01] CPU energy = Instruction count x Energy per instruction [MICRO’01] Network energy Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Optimization Algorithm 1. For each app config, find Best CPU config Requires instruction count – completes in time, with least energy [MICRO’01] CPU energy = Instruction count x Energy per instruction [MICRO’01] Network energy = Byte count x Energy per byte [WaveLAN measured] Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Optimization Algorithm 1. For each app config, find Best CPU config Requires instruction count – completes in time, with least energy [MICRO’01] CPU energy = Instruction count x Energy per instruction [MICRO’01] Network energy Requires byte count = Byte count x Energy per byte [WaveLAN measured] Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Adaptation Process: Stage 1 CPU App. Conf. 1 App configuration energy table Net Predict Next Instr. Count Predict Next Byte. Count Conf 1 Conf 2 Conf 3 Energy Energy Energy . . . Conf n Energy Adaptation Process: Stage 1 CPU App. Conf. 1 CPU Optimizer App configuration energy table Net Predict Next Instr. Count Predict Next Byte. Count Find CPU Configuration Conf 1 Conf 2 Conf 3 Energy Energy Energy . . . Conf n Energy Adaptation Process: Stage 1 CPU App. Conf. 1 Net Predict Next Instr. Count CPU Optimizer Find CPU Configuration CPU Energy Estimator Predict CPU Energy App configuration energy table Predict Next Byte. Count Predict Net Energy Conf 1 Conf 2 Conf 3 Energy Energy Energy Network Energy Estimator . . . Conf n Energy Adaptation Process: Stage 1 CPU App. Conf. 1 Net Predict Next Instr. Count Predict Next Byte. Count CPU Optimizer Find CPU Configuration CPU Energy Estimator Predict CPU Energy Predict Net Energy Network Energy Estimator + App configuration energy table Conf 1 Conf 2 Conf 3 Energy Energy Energy . . . Conf n Energy Adaptation Process: Stage 1 CPU App. Conf. 1 Net Predict Next Instr. Count Predict Next Byte. Count CPU Optimizer Find CPU Configuration CPU Energy Estimator Predict CPU Energy Predict Net Energy Network Energy Estimator + Conf 1 Conf 2 Conf 3 Energy Energy Energy . . . Conf n Energy Adaptation Process: Stage 1 CPU App. Conf. 1 Net Predict Next Instr. Count Predict Next Byte. Count CPU Optimizer Find CPU Configuration CPU Energy Estimator Predict CPU Energy Predict Net Energy + Conf 1 Conf 2 Conf 3 Energy Energy Energy . . . Conf n Energy Network Energy Estimator Adaptation Process: Stage 2 Conf 1 Conf 2 Conf 3 Energy Energy Energy . . . Conf n Energy Adaptation Process: Stage 2 Conf 1 Conf 2 Conf 3 Energy Energy Energy . . . Conf n Energy Pick Lowest Energy Adaptation Process: Stage 2 Conf 1 Conf 2 Conf 3 Energy Energy Energy . . . Conf n Energy Pick Lowest Energy Chosen Configuration CPU Adaptor Application Adaptor Adaptation Process: Stage 2 Conf 1 Conf 2 Conf 3 Energy Energy Energy . . . Conf n Energy Pick Lowest Energy Chosen Configuration CPU Adaptor Application Adaptor Capture, Encode, and Transmit Frame Predictors How do we predict instructions and bytes? Fixed software use previous frame data Adaptive software no longer works! Solution: Offline profiling Encode reference sequences offline Transition randomly between app. configs Fit predictors to transitions between configs Map last instruction, bytes to new app. config Linear, 1st-order predictors Experiments RSIM CPU simulator State-of-the-art CPU, memory Princeton Wattch energy model Reported energy typical of modern CPUs Simulation Conditions: Fixed and adaptive CPU Fixed and adaptive software Foreman sequence Fixed vs Adaptive Systems •35 •Energy (J) •30 •25 •30.49 CPU Net •21.23 •20 •15 •10 •5 •7.36 •6.25 Fixed System Adaptive S/W Adaptive H/W Adaptive Sys •0 Adaptive hardware saves 70% over fixed system Adaptive application saves 30% on fixed hardware 20% on adaptive hardware (total savings of 80%) Algorithm Comparison Baseline: Fixed software, adaptive hardware Adaptive software: Adaptive DCT/motion thresholds Instruction, byte count for next frame predicted Oracle Instruction and byte count for next frame exact Adapt-Once Adapt once at start of encoding Minimize total energy across entire sequence Algorithm Comparison •8 •7.36 •Energy (J) •6.55 •6 •6.25 •6.09 CPU Net •4 •2 Fixed Adapt Once Adaptive Oracle •0 Energy consumption of Adaptive within 3% of Oracle Simple predictors sufficient for energy savings Adaptive saves 5% over Adapt-Once Frame-by-frame adaptation can save energy Other test cases Low Power CPU Network energy dominated Software adaptation did not save energy Carphone Little inter-frame variation One-shot adaptation was sufficient Adapt-Once, Adaptive, Oracle same energy Adaptive software saved ~15% Conclusions A new framework for coordinated CPU/application adaptation Combined benefits of both adaptations Preserves separation between layers Adaptive applications save energy: Up to 20% on adaptive hardware Up to 30% on fixed hardware
© Copyright 2026 Paperzz