Energy - Sarita Adve`s Research Group

Adaptive Video Coding to
Reduce Energy on General
Purpose Processors
Daniel Grobe Sachs,
Sarita Adve, Douglas L. Jones
University of Illinois at Urbana-Champaign
http://www.cs.uiuc.edu/grace
[email protected]
Introduction
 Wireless multimedia increasingly common
 Recent advances reduce constraints:
 2GHz+ processors
 High-speed wireless networks
 Systems now Energy limited
 Energy management essential
Adaptation
 Adaptation key to energy management
 Hardware adaptation already common
 Software adaptation also possible
 Challenges
 How do we control adaptations?
 How do we coordinate different adaptations?
GRACE Project
 Target mobile multimedia devices.
 Coordinated adaptation of all system layers
 Hardware, application, network, OS
 Complete cross-layer adaptation framework
 Preserves separation between layers
Goals of this work
 Target wireless video transmission
 Adapt application: Adaptive video encoder
 Adapt hardware: Adaptive CPU
 Implement part of GRACE framework
 Trade off between CPU and network energy
Contributions
 Apply existing adaptive-CPU research
 Energy-adaptive video encoder
 Trades off between network, CPU
 Allows adaptation with fixed QoS
 Cross-layer adaptation framework
 Coordinate app and CPU adaptation
 Preserves logical separation between layers
 20% Energy savings over existing systems
Presentation Overview




System model
System architecture and design
Cross-layer adaptation process
Results
System Model
Adaptive CPU
Adaptive
Video Encoder
•Video Capture
Wireless
Network
Control
 Total Energy = CPU Energy + Network Energy
CPU Hardware Adaptation
 Reduce performance to save energy
 Voltage and frequency scaling
 Lower freq  lower voltage  lower energy
 Architecture adaptation
 Issue width
 Active functional units (ALUs, etc.)
 Instruction window size
[Micro]
Adaptive Encoder
 Based on TMN H.263 encoder
 Changed to logarithmic motion search
 Encoder adapts for energy
 Trade off between network and CPU energy
 More computation  fewer bits
 Adapt Motion Search and DCT
 Computationally expensive
 Elimination affects primarily rate
Adaptive Encoder Details
 Motion Search and DCT thresholds
 Terminate MS early when SAD under threshold
 Skip DCT if SAD of block under threshold
 Transmit “DCT flag” bit for each 8x8 block
 Extends H.263 standard
 Adaptation effect:
 Setting thresholds at infinity
 Reduces CPU load by ~50%
 Increases data rate by 2x or more
Adaptation Control
 When do we adapt?
 What configurations do we choose?
Adaptation Control
 When do we adapt?
 Adapt before every frame
 What configurations do we choose?
Adaptation Control
 When do we adapt?
 Adapt before every frame
 What configurations do we choose?
 Must minimize total CPU+network energy
 Must complete frame within its allocated time
Adaptation Control
 When do we adapt?
 Adapt before every frame
 What configurations do we choose?
 Must minimize total CPU+network energy
 Must complete frame within its allocated time
 How do we find the optimal configurations?
Optimization
 Application, CPU reconfiguration linked
 Application reconfiguration changes workload
 CPU reconfiguration changes performance
 App config affects optimal CPU configuration
… and vice versa
 Two stage approach
1. For each app config, find CPU config, energy
2. Pick lowest-energy application configuration
Optimization Algorithm
1. For each app config, find
 Best CPU config
 CPU energy
 Network energy
 Total energy = CPU energy + network energy
2. Pick app config with lowest total energy
Optimization Algorithm
1. For each app config, find
 Best CPU config
– completes in time, with least energy [MICRO’01]
 CPU energy
 Network energy
 Total energy = CPU energy + network energy
2. Pick app config with lowest total energy
Optimization Algorithm
1. For each app config, find
 Best CPU config
Requires instruction count
– completes in time, with least energy [MICRO’01]
 CPU energy
 Network energy
 Total energy = CPU energy + network energy
2. Pick app config with lowest total energy
Optimization Algorithm
1. For each app config, find
 Best CPU config
Requires instruction count
– completes in time, with least energy [MICRO’01]
 CPU energy
= Instruction count x Energy per instruction [MICRO’01]
 Network energy
 Total energy = CPU energy + network energy
2. Pick app config with lowest total energy
Optimization Algorithm
1. For each app config, find
 Best CPU config
Requires instruction count
– completes in time, with least energy [MICRO’01]
 CPU energy
= Instruction count x Energy per instruction [MICRO’01]
 Network energy
= Byte count x Energy per byte [WaveLAN measured]
 Total energy = CPU energy + network energy
2. Pick app config with lowest total energy
Optimization Algorithm
1. For each app config, find
 Best CPU config
Requires instruction count
– completes in time, with least energy [MICRO’01]
 CPU energy
= Instruction count x Energy per instruction [MICRO’01]
 Network energy
Requires byte count
= Byte count x Energy per byte [WaveLAN measured]
 Total energy = CPU energy + network energy
2. Pick app config with lowest total energy
Adaptation Process: Stage 1
CPU
App. Conf. 1
App configuration
energy table
Net
Predict Next
Instr. Count
Predict Next
Byte. Count
Conf 1 Conf 2 Conf 3
Energy Energy Energy
. . . Conf n
Energy
Adaptation Process: Stage 1
CPU
App. Conf. 1
CPU Optimizer
App configuration
energy table
Net
Predict Next
Instr. Count
Predict Next
Byte. Count
Find CPU
Configuration
Conf 1 Conf 2 Conf 3
Energy Energy Energy
. . . Conf n
Energy
Adaptation Process: Stage 1
CPU
App. Conf. 1
Net
Predict Next
Instr. Count
CPU Optimizer
Find CPU
Configuration
CPU Energy
Estimator
Predict CPU
Energy
App configuration
energy table
Predict Next
Byte. Count
Predict Net
Energy
Conf 1 Conf 2 Conf 3
Energy Energy Energy
Network Energy
Estimator
. . . Conf n
Energy
Adaptation Process: Stage 1
CPU
App. Conf. 1
Net
Predict Next
Instr. Count
Predict Next
Byte. Count
CPU Optimizer
Find CPU
Configuration
CPU Energy
Estimator
Predict CPU
Energy
Predict Net
Energy
Network Energy
Estimator
+
App configuration
energy table
Conf 1 Conf 2 Conf 3
Energy Energy Energy
. . . Conf n
Energy
Adaptation Process: Stage 1
CPU
App. Conf. 1
Net
Predict Next
Instr. Count
Predict Next
Byte. Count
CPU Optimizer
Find CPU
Configuration
CPU Energy
Estimator
Predict CPU
Energy
Predict Net
Energy
Network Energy
Estimator
+
Conf 1 Conf 2 Conf 3
Energy Energy Energy
. . . Conf n
Energy
Adaptation Process: Stage 1
CPU
App. Conf. 1
Net
Predict Next
Instr. Count
Predict Next
Byte. Count
CPU Optimizer
Find CPU
Configuration
CPU Energy
Estimator
Predict CPU
Energy
Predict Net
Energy
+
Conf 1 Conf 2 Conf 3
Energy Energy Energy
. . . Conf n
Energy
Network Energy
Estimator
Adaptation Process: Stage 2
Conf 1 Conf 2 Conf 3
Energy Energy Energy
. . . Conf n
Energy
Adaptation Process: Stage 2
Conf 1 Conf 2 Conf 3
Energy Energy Energy
. . . Conf n
Energy
Pick Lowest Energy
Adaptation Process: Stage 2
Conf 1 Conf 2 Conf 3
Energy Energy Energy
. . . Conf n
Energy
Pick Lowest Energy
Chosen Configuration
CPU
Adaptor
Application
Adaptor
Adaptation Process: Stage 2
Conf 1 Conf 2 Conf 3
Energy Energy Energy
. . . Conf n
Energy
Pick Lowest Energy
Chosen Configuration
CPU
Adaptor
Application
Adaptor
Capture, Encode, and Transmit Frame
Predictors
 How do we predict instructions and bytes?
 Fixed software  use previous frame data
 Adaptive software  no longer works!
 Solution: Offline profiling
 Encode reference sequences offline
 Transition randomly between app. configs
 Fit predictors to transitions between configs
 Map last instruction, bytes to new app. config
 Linear, 1st-order predictors
Experiments
 RSIM CPU simulator
 State-of-the-art CPU, memory
 Princeton Wattch energy model
 Reported energy typical of modern CPUs
 Simulation Conditions:
 Fixed and adaptive CPU
 Fixed and adaptive software
 Foreman sequence
Fixed vs Adaptive Systems
•35
•Energy (J)
•30
•25
•30.49
CPU
Net
•21.23
•20
•15
•10
•5
•7.36
•6.25
Fixed System
Adaptive S/W
Adaptive H/W
Adaptive Sys
•0
 Adaptive hardware saves 70% over fixed system
 Adaptive application saves
 30% on fixed hardware
 20% on adaptive hardware (total savings of 80%)
Algorithm Comparison
 Baseline: Fixed software, adaptive hardware
 Adaptive software:
 Adaptive DCT/motion thresholds
 Instruction, byte count for next frame predicted
 Oracle
 Instruction and byte count for next frame exact
 Adapt-Once
 Adapt once at start of encoding
 Minimize total energy across entire sequence
Algorithm Comparison
•8
•7.36
•Energy (J)
•6.55
•6
•6.25
•6.09
CPU
Net
•4
•2
Fixed
Adapt Once
Adaptive
Oracle
•0
 Energy consumption of Adaptive within 3% of Oracle
 Simple predictors sufficient for energy savings
 Adaptive saves 5% over Adapt-Once
 Frame-by-frame adaptation can save energy
Other test cases
 Low Power CPU
 Network energy dominated
 Software adaptation did not save energy
 Carphone




Little inter-frame variation
One-shot adaptation was sufficient
Adapt-Once, Adaptive, Oracle same energy
Adaptive software saved ~15%
Conclusions
 A new framework for coordinated
CPU/application adaptation
 Combined benefits of both adaptations
 Preserves separation between layers
 Adaptive applications save energy:
 Up to 20% on adaptive hardware
 Up to 30% on fixed hardware