when necessary (cache misses)

Hardware-Software Integrated Approaches to Defend
Against Software Cache-based Side Channel Attacks
Jingfei Kong*
Onur Acıiçmez
Jean-Pierre Seifert
Huiyang Zhou
University of Central Florida
Samsung Electronics
TU Berlin & Deutsche Telekom Laboratories
University of Central Florida
Why Should We Care about Side Channel Attacks?
• Cryptographic applications are the very important
software component in modern computers( e.g. secure
online transactions)
• Cryptographic algorithms are designed to impose
unreasonable time and resource cost on a successful attack
 To break a 128-bit symmetric key in brute-force: 2128
possibilities, a device that can check 260/second still requires
around 9.4*240 years, about 700 times the age of the universe.
• By exploiting certain features of modern microprocessors,
it may just take few hours to get the secret key!
University of Central Florida
2
What are Software Cache-based Side Channel Attacks?
• Side channel attacks
 exploit any observable information generated as a byproduct of the
cryptosystem implementation, e.g. power trace, electromagnetic
radiation
 infer the secret information, e.g. the secret key
• Software cache-based side channel attacks
 exploit latency difference between cache access and memory
access
 the source of information leakage: cache misses of critical data
whose addresses are dependent on the secret information
 mainly access-driven attacks and time-driven attacks
University of Central Florida
3
An Example: Advanced Encryption Standard (AES)
• one of the most popular algorithms in symmetric key
cryptography




16-byte input (plaintext)
16-byte output (ciphertext)
16-byte secret key (for standard 128-bit encryption)
several identical rounds of 16 XOR operations and 16 table
lookups in a performance-efficient software implementation
secret key byte
Lookup Table
index byte
input/output byte
University of Central Florida
4
Access-driven Attacks
Cache
a
b
c
d
Main Memory
spy process’s data
victim process’s data
b>(a≈c≈d)
University of Central Florida
5
Time-driven Attacks
cache hit/miss
computation
cache hit/miss
computation
Total execution time is affected by cache misses
indices of table lookups
secret key byte
input/output byte
University of Central Florida
6
Cache-collision Time-driven Attacks on AES
cache hit/miss i
Xi
computation
Ki
cache hit/miss j
Xj
Case 1: Xj
Kj ≠ Xi
Case 2: Xj
Kj = Xi
Ki
Xj
Kj = Xi
Ki => Ki
Ki
computation
Kj
cache access j is a cache miss
assuming no same cache access before
cache access j is a cache hit
assuming no conflict miss in between
Kj = Xi
Xj
Statistically speaking, Case 1 takes longer execution time than Case 2.
Only when Ki
Kj = Xi
Xj, AES encryption exhibits the shortest execution time
University of Central Florida
7
The Foundation of Cache-Collision Attacks
Relative Execution Time (cycles)
20
7.82
the number of collisions in the final round of AES
0
0
1
-8.89
2
3
4
5
-20
-40
-60
-80
-26.87
-47.77
-72.03
-100
-120
-97.54
one Pentium 4 processor
A higher number of collisions, a smaller number of cache misses, thus a shorter encryption time
University of Central Florida
8
Current Proposed Software/Hardware Countermeasures
• Software proposals
+
−
−
−
−
easy to deploy with no hardware changes
application specific
substantial performance overhead
data layout and code have to be changed
no security guarantee
• Hardware proposals
+
+
−
−
−
generic (not application specific)
performance efficient
still with some security issues
hardware changes
not flexible
University of Central Florida
9
Hardware-Software Integrated Approaches
• Hardware tackles the source of information leakage: cache
misses over critical data
• Software offers the flexibility, even against future attacks
• Three approaches for enhancing the security of various
cache designs with tradeoffs between hardware
complexity and performance overhead
 preloading to protect PLcache (from ISCA’07)
 securing RPcache (from ISCA’07) with informing loads
 securing regular caches with informing loads
University of Central Florida
10
Informing Loads Approach: Informing Memory Operations
• Informing load instructions
 work as normal load instructions upon cache hits
 generate an user-level exception upon cache misses
 originally proposed as a lightweight support for memory
optimization (ISCA’96)
• Leverage the same information exploited by attacks
 Use informing load instructions to read data from lookup
tables
 The flexible countermeasures are provided by software
implementation in the exception handler
University of Central Florida
11
Defend against access-driven attacks
! Even the very first cache miss is security-critical to access-driven
attacks
• software random permutation in AES implementation
+ randomize the mapping between table indices and cache lines
+ obfuscate attackers’ observation
! Fixed software random permutation is vulnerable
• detect the event of cache misses using informing loads and
perform permutation update in the exception handler
+ every time there is a chance (cache miss) to leak the information,
the permutation is changed randomly
+ balance the tradeoff between security and performance
 Overall, a software random permutation scheme with permutation
update only when necessary (cache misses)
University of Central Florida
12
Defend against time-driven attacks
! The correlation between the secret key and number of cache
misses
• detect the event of cache misses using informing loads
• load all the critical data into cache in the exception
handler
+ avoid cache misses for subsequent cache access
+ break the correlation
University of Central Florida
13
The Defense Procedure
Cache
Main Memory
other process’s data
AES’ data
0. AES implementation uses the
software random permutation
version instead of the original
one
1. Informing load instructions are
used to load those critical data
2. The cache miss over critical data
is detected by informing load
instructions. The program
execution is redirected to the
user-level exception handler.
3. Inside the exception handler, all
critical data are preloaded into
cache. Also permutation update
is performed between the
missing cache line and a
randomly-chosen cache line.
University of Central Florida
14
The Implementation of Software Random Permutation in AES
original lookup table
T[0…N-1]
N=K*L
T[0]
...
T[1]
converted lookup tables
T ’[0, …, K-1]
...
T[N-1]
...
T[0]
T[1]
...
T[L-1]
cache line q
cache line p
L is the number of the elements in one cache line
University of Central Florida
15
Countermeasure Implementation in the Exception Handler
 preload all table data to defend against time-driven attacks
through prefetching from address pointers T’[0], T’[1], …, T’[K-1]
 permutation update to defend against access-driven attacks
by swapping both the pointers and the data
T’[0, …, K-1]
T’[0, …, K-1]
...
...
Addr: 0x80
Addr: 0x80
Addr: 0x40
Addr: 0x40
University of Central Florida
16
Experiments
• Experimental Setup
 Default processor configuration in a MIPS-like SimpleScalar simulator
• pipeline bandwidth:4, 128 ROB, 64 IQ, 64 LSQ
• 32KB 2-way I/D L1, 2MB L2, cache block size 64B
• fetch policy for SMT: round-robin
 AES software implementation (baseline): OpenSSL 0.9.7c implementation
 AES performance microbenchmark: OpenSSL speed test program
• Security Evaluation
 impact of ILP and MLP on cache collision time-driven attacks
 security analysis on our regular caches with informing loads approach
• Performance Evaluation
 performance impact on AES
 performance impact on an SMT processor
University of Central Florida
17
Impact of ILP and MLP on Cache-collision Time-driven Attacks
0.5%
the number of cache collisions in the final round of AES
Normalized Execution Time
0.0%
0
1
2
3
4
5
-0.5%
-1.0%
-1.5%
-2.0%
-2.5%
Configuration 1
Configuration 2
Configuration 3
Pentium 4
Default Configuration
Configuration 4
-3.0%
the more ILP and MLP, the less observable trend
the less correlation between the number of cache collisions and the execution time
the less correlation between the key and execution time,
the more number of samples required for a successful attack
University of Central Florida
18
Security Evaluation on Regular Caches with Informing Loads
• Mitigation against access-driven attacks
(see the theoretical proof from Wang et al. at ISCA’07)
• Mitigation against cache collision time-driven attacks
Normalized Execution Time
0.5%
the number of cache collisions in the final round of AES
0.0%
-0.5%
0
1
2
3
4
5
-1.0%
-1.5%
Config. 1 with Regular cache
-2.0%
-2.5%
Config. 1 with Regular cache + IL
-3.0%
University of Central Florida
19
Performance Impact on AES
performance improves
most as
of cache
the overhead
conflict
performance
is because
takes
of the
a hit because of cache
misses between indirection
lookup tabletable
dataintroduced
conflict
and
misses
for software
between the lookup table
other data are almost gone because
randomization
data
of and other data, which causes lots of
larger caches/more associativties
exception handling
Normalized Performance
100%
Baseline
80%
Regular cache+IL
60%
40%
20%
0%
1way
2way
8K
4way
1way
2way
4way
1way
16K
University of Central Florida
2way
4way
32K
20
Performance Impact on a 2-way SMT Processor
• With larger caches / more associativities, the
performance overheads on throughput and fairness
from the exception handling are diminishing
• Still the indirection table lookup imposes certain
performance overhead on the throughput
Normalized Performance
100%
80%
60%
Baseline
Regular cache+IL
40%
20%
0%
8K_1way
16K_4way
32K_4way
64K_4way
8K_1way
16K_4way
Throughput
32K_4way
64K_4way
Hmean
AES running with SPEC2000 INT
University of Central Florida
21
Conclusions
• Software cache-based side channel attacks are emerging
threats
• Cache misses are the source of information leakage
• We proposed hardware-software integrated approaches to
provide stronger security protection over various cache
designs
 A light-weight hardware support, informing loads, is
proposed to protect regular caches with flexible software
countermeasures and it incurs certain performance overhead
 Preloading and informing loads are also proposed to
enhance the security of previously proposed secure cache
designs.
University of Central Florida
22
Thank you!
Questions?
University of Central Florida
23