Dynamic Thermal Management - cs.Virginia

The Laboratory for Computer Architecture
at Virginia
(LAVA)
Kevin Skadron
University of Virginia
Department of Computer Science
Page 1
Why We Care About Thermal Management...
Source: Tom’s Hardware Guide
http://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html
Page 2
Dynamic Thermal Management

Dynamically adjust execution to control temperature

Avoid catastrophic failure (heat sink, fan)

Permit the use of a less expensive thermal package
 Design
for less than the worst case
 Package costs ~$1 / W above ~40 W
 Peak power as high as 130 W in 1-2 generations (SIA roadmap)
 Temperatures over 100°C
Page 3
Dynamic Thermal Management

Deal with “hot spots”
 Localized
heating occurs much faster than chip-wide
 Chip-wide treatment is too conservative

Prove temperature will be safely bounded
Page 4
Thermal Modeling

Want a fine-grained model of temperature

Power dissipation: too indirect, not easy to measure in HW
Page 5
“Ohm’s Law” for Temperature
V  temp
I  power
R  thermal resistance
C  thermal capacitance
RC  time constant
I · t
V · t
V = ------- + -------C
RC

Lets us compute stepwise changes in temperature for any
granularity at which we can get P, T, R, C

steady-state: V = IR (T = PR)
Page 6
Thermal Modeling
 Use thermal resistance and capacitance of Si
 Develop computationally efficient model based on lumped
values
Pi · t
Ti · t
Ti = -------- + --------Ci
RiCi
 Integrate in Wattch (power/performance simulator)
 Time evolution of temperature is driven by unit activities and
power dissipations on a per-cycle basis
 Detect hot spots and activate thermal response
 Typical time constant: 10-100 s
Page 7
Fetch Toggling

Fetch toggling
 disable
fetch every N cycles
 4/5, 2/3, 1/2, 1/3, 1/5, …
IF
ID
EX
Page 8
MEM
WB
Fetch Toggling

Fetch toggling
 disable
fetch every N cycles
 4/5, 2/3, 1/2, 1/3, 1/5, …
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
Page 9
Fetch Toggling

Fetch toggling
 disable
fetch every N cycles
 4/5, 2/3, 1/2, 1/3, 1/5, …
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
 How
to set the fetch rate?
Page 10
Feedback-Control of Fetch Toggling

Formal feedback control
setpoint
e
Controller
measured T
m
Actuator:
I-fetch toggling
P
Temp. sensor
PID: m = KC (e + KIe + Kdde/dt)

easy to compute

toggling = f(m)
Page 11
Thermal
dynamics
T
Other Thermal-Management Techniques

Fetch toggling

Fetch throttling

Decode throttling

Speculation control

Frequency/voltage scaling
Page 12
Per-Structure Response

Hot spots
 Branch
predictor (probed every cycle)
 Load-store queue
 L1 D-cache (for high-BW apps)
 …most major structures are a hot spot for at least one SPEC2k
app

Modified Wattch

Sampling rate: 1000 cycles (RC of hot spots is 10-100 s)

Base temp. of 100C (SIA roadmap)

Emergency threshold of 108 (Yuan/Hong SEMI-THERM ‘01)

Set point of 107.9
Page 13
Thermal Modeling: Where to go from here?
(i.e., lots of research questions)

Floor-planning issues and granularity of lumped R/C
values

Thermal coupling among blocks

Response lag in temperature sensors

Validation techniques

Visualization

How to deal with large time scales?
Page 14
Thermal Management: Where to go from here?
(i.e., lots more research questions)

New mechanisms

Characterize benchmarks

When to use frequency/voltage scaling

Faster HW techniques for sensing temperature changes

Robust response despite sensor lag

Hot spots

Temperature effects on leakage current

Joint control of temp., power, and performance
Page 15
Thermal Management: Where to go from here?
(i.e., lots more research questions)

New mechanisms

When to use clock scaling

Robust response despite sensor lag

Temperature effects on leakage current

Joint control of temperature, power, and performance
Page 16
Summary

New tools for thermal management
 Models
 Mechanisms
Source: Tom’s Hardware Guide
http://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html
Page 17
Backup slides
Page 18
0%
Page 19
Performance loss reduced by 65%
25%
MEAN
30%
bzip
vortex
perlbmk
eon
parser
fma3d
facerec
crafty
equake
art
mesa
gcc
Percent Loss in Performance
Performance Loss
toggle1
PID
20%
15%
10%
5%