AMNESIAC: Amnesic Automatic Computer
Trading Computation for Communication for Energy Efficiency
Ismail Akturk and Ulya R. Karpuzcu
University of Minnesota, Twin Cities
{aktur002, ukarpuzc}@umn.edu
Motivation
Technology Node
40nm
10nm
Energy of on chip communication
normalized to computation
1.55x
5.77x
Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., and Glasco, D. “GPUs and the Future of Parallel Computing”, IEEE Micro 31, 5 (2011).
AMNESIAC: Amnesic Automatic Computer
2
Motivation
Technology Node
40nm
10nm
Energy of on chip communication
normalized to computation
1.55x
5.77x
(Re)computing becomes cheaper than memorizing and retrieving data
Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., and Glasco, D. “GPUs and the Future of Parallel Computing”, IEEE Micro 31, 5 (2011).
AMNESIAC: Amnesic Automatic Computer
3
Amnesic* Automatic Computer
• Idea: Replace energy-hungry load with sequence of instructions
*amnesia: noun a partial or total loss of memory (origin late 18th cent.: from Greek amnēsia ‘forgetfulness’)
AMNESIAC: Amnesic Automatic Computer
4
Amnesic* Automatic Computer
• Idea: Replace energy-hungry load with sequence of instructions
• Scope
• How to determine which data values to recompute
• How to identify instructions to recompute data values
• How to protect architectural state during recomputation
• How to orchestrate recomputation at runtime
*amnesia: noun a partial or total loss of memory (origin late 18th cent.: from Greek amnēsia ‘forgetfulness’)
AMNESIAC: Amnesic Automatic Computer
5
How to determine which data values to recompute
core
L1
core
load A
L2
memory
Eld,A ? Erc,A
load B
L1
L2
A
core
B
memory
Eld,B ? Erc,B
AMNESIAC: Amnesic Automatic Computer
L1 C
load C
L2
memory
Eld,C ? Erc,C
6
Recomputation Slice (RSlice)
AMNESIAC: Amnesic Automatic Computer
7
Overview
• Amnesic Compiler
• Identifies independent RSlices
• Annotates RSlices: RCMP, REC, RTN
AMNESIAC: Amnesic Automatic Computer
8
Overview
• Amnesic Compiler
• Identifies independent RSlices
• Annotates RSlices: RCMP, REC, RTN
• Amnesic Scheduler
• Decides when to recompute
• Can use various policies
• Compiler
• First-Level Cache Miss - FLC
• Last-Level Cache Miss - LLC
AMNESIAC: Amnesic Automatic Computer
9
Overview
• Amnesic Compiler
• Identifies independent RSlices
• Annotates RSlices: RCMP, REC, RTN
• Amnesic Scheduler
• Decides when to recompute
• Can use various policies
• Compiler
• First-Level Cache Miss - FLC
• Last-Level Cache Miss - LLC
• Amnesic Microarchitecture
• Prevents corruption of architectural state
• Makes RSlice inputs available at the time of recomputation
AMNESIAC: Amnesic Automatic Computer
10
Amnesic Microarchitecture
SFile
Source and Destination Registers
…
AMNESIAC: Amnesic Automatic Computer
11
Amnesic Microarchitecture
Renamer
SFile
RegisterFile[#] -> SFile[#]
Source and Destination Registers
…
AMNESIAC: Amnesic Automatic Computer
12
Amnesic Microarchitecture
Renamer
SFile
Source and Destination Registers
RegisterFile[#] -> SFile[#]
…
Hist
Leaf Address
v
v
Non-Recomputable Inputs
…
…
…
…
AMNESIAC: Amnesic Automatic Computer
13
Amnesic Microarchitecture
Renamer
SFile
Source and Destination Registers
RegisterFile[#] -> SFile[#]
…
IBuff
Hist
RSlice Instructions
Leaf Address
…
v
v
Non-Recomputable Inputs
…
…
…
…
AMNESIAC: Amnesic Automatic Computer
14
Putting It All Together: Default Execution
Renamer
SFile
Source and Destination Registers
RegisterFile[#] -> SFile[#]
…
IBuff
Hist
RSlice Instructions
Leaf Address
…
v
v
Non-Recomputable Inputs
…
…
…
…
AMNESIAC: Amnesic Automatic Computer
0
Memory
or
RegisterFile
15
Putting It All Together: Default Execution
Renamer
SFile
Source and Destination Registers
RegisterFile[#] -> SFile[#]
…
IBuff
Hist
RSlice Instructions
Fetch
Logic
1
Leaf Address
…
v
v
Non-Recomputable Inputs
…
…
…
…
AMNESIAC: Amnesic Automatic Computer
0
Memory
or
RegisterFile
16
Putting It All Together: During Recomputation
Renamer
SFile
Source and Destination Registers
RegisterFile[#] -> SFile[#]
…
2
IBuff
Hist
RSlice Instructions
Fetch
Logic
1
Leaf Address
…
v
v
Non-Recomputable Inputs
…
…
…
…
AMNESIAC: Amnesic Automatic Computer
0
Memory
or
RegisterFile
17
Putting It All Together: During Recomputation
Renamer
SFile
RegisterFile[#] -> SFile[#]
Hist
RSlice Instructions
Fetch
Logic
1
…
3
2
IBuff
Execution
Units
Source and Destination Registers
Leaf Address
…
v
v
Non-Recomputable Inputs
…
…
…
…
AMNESIAC: Amnesic Automatic Computer
4
0
Memory
or
RegisterFile
18
Putting It All Together: During Recomputation
Renamer
SFile
Source and Destination Registers
RegisterFile[#] -> SFile[#]
IBuff
Hist
RSlice Instructions
Fetch
Logic
1
Execution
Units
…
3
2
5
Leaf Address
…
v
v
Non-Recomputable Inputs
…
…
…
…
AMNESIAC: Amnesic Automatic Computer
4
0
Memory
or
RegisterFile
19
Putting It All Together: During Recomputation
Renamer
6
Source and Destination Registers
RegisterFile[#] -> SFile[#]
IBuff
Execution
Units
Hist
RSlice Instructions
Fetch
Logic
5
…
3
2
1
SFile
Leaf Address
…
v
v
Non-Recomputable Inputs
…
…
…
…
AMNESIAC: Amnesic Automatic Computer
4
0
Memory
or
RegisterFile
20
Putting It All Together: During Recomputation
Renamer
6
RegisterFile[#] -> SFile[#]
IBuff
5
…
7
8 Execution
Units
Hist
RSlice Instructions
Fetch
Logic
Source and Destination Registers
3
2
1
SFile
Leaf Address
…
v
v
Non-Recomputable Inputs
…
…
…
…
AMNESIAC: Amnesic Automatic Computer
4
0
Memory
or
RegisterFile
21
Evaluation Setup
• Benchmarks: SPEC2006, NAS, Parsec, Rodinia
• 33 benchmarks evaluated, 11 of them show > 10% EDP gain
• Amnesic compiler pass mimicked by binary generator Pintool
• Microarchitecture and scheduler implemented in Snipersim
AMNESIAC: Amnesic Automatic Computer
22
Energy-Delay Product
mcf
sx
cg
is
ca
fs
fe
AMNESIAC: Amnesic Automatic Computer
rt
bp
bfs
sr
23
Energy-Delay Product
mcf
sx
cg
is
ca
fs
fe
AMNESIAC: Amnesic Automatic Computer
rt
bp
bfs
sr
24
Energy-Delay Product
mcf
sx
cg
is
ca
fs
fe
AMNESIAC: Amnesic Automatic Computer
rt
bp
bfs
sr
25
Energy-Delay Product
mcf
sx
cg
is
ca
fs
fe
AMNESIAC: Amnesic Automatic Computer
rt
bp
bfs
sr
26
Energy-Delay Product
up to 87% (24.92% on average ) EDP gain obtained
mcf
sx
cg
is
ca
fs
fe
AMNESIAC: Amnesic Automatic Computer
rt
bp
bfs
sr
27
Instruction Mix
Benchmark
% increase in
instruction count
% decrease in load
count
mcf
4.47
6.19
sx
4.55
6.68
cg
3.97
2.11
is
17.97
49.99
ca
7.38
7.95
fs
1.83
3.08
fe
3.55
1.75
rt
1.97
6.08
bp
31.89
55.55
bfs
1.20
60.93
sr
20.02
23.33
AMNESIAC: Amnesic Automatic Computer
28
Instruction Mix
Benchmark
% increase in
instruction count
% decrease in load
count
mcf
4.47
6.19
sx
6.68
49.99% reduction in loads,
at the4.55expense of 17.97%
more instructions
cg
3.97
2.11
is
17.97
49.99
ca
7.38
7.95
fs
1.83
3.08
fe
3.55
1.75
rt
1.97
6.08
bp
31.89
55.55
bfs
1.20
60.93
sr
20.02
23.33
AMNESIAC: Amnesic Automatic Computer
29
Instruction Mix
Benchmark
% increase in
instruction count
% decrease in load
count
mcf
4.47
6.19
sx
6.68
49.99% reduction in loads,
at the4.55expense of 17.97%
more instructions
cg
3.97
2.11
is
17.97
49.99
fs
1.83
3.08
fe
3.55
1.75
rt
1.97
6.08
bp
31.89
55.55
bfs
1.20
60.93
sr
20.02
23.33
7.38
7.95 energy
translates intoca 74.68% reduction
in (load)
AMNESIAC: Amnesic Automatic Computer
30
RSlice Length
sx
bfs
AMNESIAC: Amnesic Automatic Computer
31
RSlice Length
majority of RSlices (78.32%) have less than 10 instructions
sx
bfs
AMNESIAC: Amnesic Automatic Computer
32
Summary
• Amnesic execution can minimize power and performance overhead of
• Data retrieval
• Data communication
• Amnesic execution makes the workload more compute-intensive
• Better use of classic architectures
• Up to 87% improvement in energy-delay product (avg. 24%)
AMNESIAC: Amnesic Automatic Computer
33
Questions and Comments
Please send your questions and comments to
[email protected]
AMNESIAC: Amnesic Automatic Computer
34
AMNESIAC: Amnesic Automatic Computer
Trading Computation for Communication for Energy Efficiency
Ismail Akturk and Ulya R. Karpuzcu
University of Minnesota, Twin Cities
{aktur002, ukarpuzc}@umn.edu
© Copyright 2026 Paperzz