Cache attacks

Topics in Cryptography
Lecture 7
Topic: Side Channels
Lecturer: Moni Naor
Recap: chosen ciphertext security
• Why chosen ciphertext/malleability matters
• Taxonomy of Attacks and Security
• Ideas for achieving CCA
– Redundancy + Verification
• The NIZK approach
• Simple scheme achieving CCA1
– Based on DDH
– Modification achieving CCA2
• Chosen-Ciphertext Security via Correlated Products
• CCA and IBE
• Deniable Authentication
Adversarial Models
STANDARD MODEL:


Abstract models of computation
 Interactive Turing machines
 Private memory, randomness
 ...
Well-defined adversarial access
 Can model powerful attacks
REAL LIFE:


Physical implementations leak information
Adversarial access not always captured by
abstract models
Ek(m)
3
Adversarial Models
Attacks in the standard
model:






Attacks outside the standard
model:
Chosen-plaintext attacks
Chosen-ciphertext attacks
Composition
Self-referential encryption
Circular encryption
....






Timing attacks [Kocher 96]
Fault detection [BDL 97, BS 97]
Power analysis [KJJ 99]
Cache attacks [OST 05]
Memory attacks [HSHCPCFAF 08]
...
Ek(m)
4
Adversarial Models
Attacks in the standard
model:






Chosen-plaintext attacks
Chosen-ciphertext attacks
Composition
Self-referential encryption
Circular encryption
....
Attacks outside the standard
model:







Timing attacks [Kocher 96]
Fault detection [BDL 97, BS 97]
Power analysis [KJJ 99]
Cache attacks [OST 05]
Memory attacks [HSHCPCFAF 08]
Electromagnetic radiation analysis
...
Side channel:
Any information not captured by the
abstract “standard” model
5
Countermeasures
GOALS


Preserve functionality
Security
HARDWARE



E.g., minimizing electromagnetic
leakage, “tamper-proof” devices,...
Ad-hoc solutions
Typically expensive or inefficient


Efficiency
Generic methods
SOFTWARE



E.g., fixed timing (indep. of input),
oblivious RAM,...
Many heuristics
Require precise modeling
6
Thesis
of
this
course
and not only at
implementation time
Must incorporate side-channel attacks
in the design of systems
Many tools developed in the
foundations of cryptography are
helpful for protecting against
side-channel attacks
Proof by examples...
7
Side Channels
• Timing Attacks
• Cache Attacks
• Memory Attacks
slide 9
Timing Attacks
• Kocher, Timing Attacks on Implementations of DiffieHellman, RSA, DSS, and Other Systems, (CRYPTO
1996)
• Brumley and Boneh, Remote Timing Attacks Are
Practical, (USENIX Security 2003)
Slides based on Vitaly Shmatikov
slide 10
Timing Attack
• Basic idea: learn the system’s secret by observing
how long it takes to perform various computations
• Typical goal: extract private key
• Extremely powerful because isolation doesn’t help
– Victim could be remote
– Victim could be inside its own virtual machine
– Keys could be in tamper-proof storage or smartcard
• Attacker wins simply by measuring response times
slide 11
RSA Cryptosystem
• Key generation:
– Generate large primes P, Q
– Compute N=PQ and (N)=(P-1)(Q-1)
– Choose small e, relatively prime to (N)
Why?
• Typically, e=3 or e=216+1=65537
– Compute unique d such that ed = 1 mod (N)
– Public key = (e,N); private key = d
• Encryption of m (simplified!): c = me mod N
• Decryption of c: cd mod N = (me)d mod N = m
RSA Decryption
• RSA decryption: compute yx mod N
– A modular exponentiation operation
• Naive algorithm: square and multiply:
x
1 0
1
w
Basic Timing
Whether iteration takes a long time
depends on the kth bit of secret exponent
This takes a while
to compute
This is instantaneous
Old observation: timing depends
on number of 1’s
If all multiplication take the same time: all you get
Not all multiplications were created equal
• Different timing given operands
• Assumption/Heuristic: timings of subsequent
multiplications are independent
Exact
– Given that we know the first k-1 bits of x
timing
– Given a guess for the kth bit of x
Exact
guess
– Time of remaining bits independent
Given measurement of total time can see whether there is
correlation between events:
kth step is long
Total time is long
Outline of Kocher’s Attack
• Idea: guess some bits of the exponent;
– Predict how long decryption will take
• If guess is correct, will observe correlation; if
incorrect, then prediction will look random
– The more bits you already know, the stronger the signal,
thus easier to detect (error-correction property)
• Start by guessing a few top bits, look at correlations
for each guess, pick the most promising candidate
and continue
Works against systems under direct control
RSA in OpenSSL
• OpenSSL: popular open-source toolkit
–
–
–
–
mod_SSL (in Apache = 28% of HTTPS market)
stunnel (secure TCP/IP servers)
sNFS (secure NFS)
Many more applications
• Kocher’s attack doesn’t work against OpenSSL
– Instead of square-and-multiply, OpenSSL uses CRT,
sliding windows and two different multiplication
algorithms for modular exponentiation
• CRT = Chinese Remainder Theorem
• Secret exponent is processed in chunks, not bit-by-bit
Chinese Remainder Theorem
• n = n1n2…nk
where gcd(ni,nj)=1 when i  j
• The system of congruences
x = x1 mod n1 = … = xk mod nk
– Has a simultaneous solution x to all congruences
– There exists exactly one solution x between 0 and n-1
• For RSA modulus N=PQ, to compute x mod N
enough to know x mod P and x mod Q
RSA Decryption With CRT
• To decrypt c, need to compute m=cd mod N
• Use Chinese Remainder Theorem
–
–
–
–
–
d1 = d mod (P-1)
d2 = d mod (P-1)
these are precomputed
qinv = Q-1 mod P
Compute m1 = cd1 mod P; m2 = cd2 mod Q
Compute m = m2+(qinv*(m1-m2) mod P)*Q
Attack this computation in order
to learn Q
Operations Involved in Decryption
What is needed to compute cd mod Q and xy mod Q?
• Exponentiation
– Sliding windows
• Multiplication routines
– “Normal” - when operands have unequal length
– Karatsuba - faster when operands have equal length
• Modular reduction
n
log23
– Montgomery reduction
Time of these operations is input sensitive
Montgomery Reduction
• Decryption requires computing m2 = cd2 mod Q
• Done by repeated multiplication
– Simple: square and multiply (process d2 one bit at a time)
– More clever: sliding windows (process d2 in 5-bit blocks)
• In either case, many multiplications modulo Q
• Multiplications use Montgomery reduction
– Pick some R = 2k
Avoid long divisions
– To compute x ¢ y mod Q: convert x and y into their
Montgomery form xR mod q and yR mod q
– Compute (xR * yR) * R-1 = zR mod q
R
a
power
• Multiplication by R-1 can be done very efficiently
of 2
Schindler’s Observation
• At the end of Montgomery reduction: if zR > Q, then
need to subtract Q
– Probability of this extra step is proportional to c mod Q
• If c is close to Q, many subtractions will be done
• If c mod Q = 0, very few subtractions
– Decryption will take longer as c gets closer to Q, then
become fast as c passes a multiple of Q
• By playing with different values of c and observing
how long decryption takes, attacker can guess Q!
If all other operations are fixed!
Reduction Timing Dependency
Decryption time
Q
2Q
P
Value of ciphertext c
Integer Multiplication Routines
• 30-40% of OpenSSL running time is spent on integer
multiplication
• If integers have the same number of words n,
OpenSSL uses Karatsuba multiplication
– Takes O(nlog23)
• If integers have unequal number of words n and m,
OpenSSL uses normal multiplication
– Takes O(nm)
Summary of Time Dependencies
g<q
g>q
Montgomery
effect
Longer
Shorter
Multiplication
effect
Shorter
Longer
g is the decryption value (same as c)
Different effects… but one will always
dominate!
slide 25
Attack Is Binary Search
Decryption time
#Reductions
Mult routine
0-1 Gap
Q
Value of ciphertext
slide 26
Attack Overview
• Initial guess g for Q between 2511 and 2512
• Try all possible guesses for the top few bits
• Suppose we know i-1 top bits of Q. Goal: ith bit
– Set g =…known i-1 bits of Q …000000
– Set ghi=…known i-1 bits of Q …100000 (note: g<ghi)
• If g<Q<ghi then the ith bit of Q is 0
• If g<ghi<Q then the ith bit of Q is 1
• Goal: decide whether g<Q<ghi or g<ghi<Q
slide 27
Two Possibilities for ghi
Decryption time
g
#Reductions
Mult routine
ghi?
Difference in decryption times
between g and ghi will be small
ghi?
Q
Value of ciphertext
Difference in decryption times
between g and ghi will be large
slide 28
Timing Attack Details
• What is “large” and “small”?
– Know from attacking previous bits
• Decrypting just g does not work because of sliding
windows
g, g+1, …, g+
– Decrypt a neighborhood of values near g
– Will increase difference between large and small
values, resulting in larger 0-1 gap
• Attack required only 2 hours, about 1.4 million
queries to recover the private key
– Only need to recover most significant half bits of q
slide 29
The 0-1 Gap
Zero-one gap
slide 30
Extracting RSA Private Key
Montgomery reduction
dominates
zero-one gap
Multiplication routine dominates
slide 31
Normal SSL Handshake
Regular client
1. ClientHello
2. ServerHello
SSL
server
(send public key)
3. ClientKeyExchange
(encrypted under public key)
Exchange data encrypted with new shared key
slide 32
Attacking SSL Handshake
1. ClientHello
Attacker
2. ServerHello
SSL
server
(send public key)
3. Record time t1
Send guess g or ghi
4. Alert
5. Record time t2
Compute t2–t1
slide 33
Works On The Network
Similar timing on
WAN vs. LAN
slide 34
Defenses
• Require statically that all decryptions take the
same time
– For example, always do the extra “dummy” reduction
– … but what if compiler optimizes it away?
• Dynamically make all decryptions the same or
multiples of the same time “quantum”
– Now all decryptions have to be as slow as the slowest
decryption
• Use RSA blinding
slide 35
RSA Blinding
• Instead of decrypting ciphertext c, decrypt a
random ciphertext related to c
Choose random r 2 ZN*
Compute x’ = c ¢ re mod
Can prepare
ahead
–
–
N
– Decrypt x’ to obtain m’ =x’d
– Calculate original plaintext m = m’/r mod N
• Since r is random, decryption time is independent
of ciphertext
• 2-10% performance penalty
slide 36
Blinding Works
slide 37
Cache Attacks
• Cryptanalysis through Cache Address Leakage:
Dag Arne Osvik, Adi Shamir , Eran Tromer
Slides based on Eran Tromer
Cache attacks
• Pure software
• No special privileges
• No interaction with the cryptographic code
• Very efficient
– full AES key extraction from Linux encrypted partition
in 65 milliseconds)
• Compromise otherwise well-secured systems
• “Commoditize” side-channel attacks:
– Easily deployed software breaks many common
systems
Why cache?
Annual
speed
increase:
Typical
latency:
CPU core cache
Main memory
60%
7-9%
(until recently)
0.3ns
50-150ns
→ timing gap
Address leakage
The cache is a shared resource:
• cache state affects, and is affected by, all
processes, leading to crosstalk between
processes.
• The cached data is subject to memory protection…
– Not attacked
• But the “metadata” leaks information about
memory access patterns:
Which addresses are being accessed.
Associative memory cache
DRAM
memory block
(64 bytes)
cache
cache line
(64 bytes)
cache
DRAM
S-box tables in memory
cache
DRAM
Detecting access to AES tables
Measurement technique
Two approaches to exploit Inter-process crosstalk:
• Measuring the effect of the cache on the
encryption
– Need precise timing
• Measuring the effect of the encryption on the
cache
Measuring effect of cache on encryption
1. Make
sure the
tables are
cached
DRAM
2. Evict one
cache set
3. Time an
cache
encryption
and see if
it’s slow
Measurement technique
Two approaches to exploit Inter-process crosstalk:
• Measuring the effect of the cache on the
encryption
– Need precise timing
• Measuring the effect of the encryption on the
cache
Measuring effect of encryption on cache
1. Completely
cache
DRAM
evict tables
from cache
Measuring effect of encryption on cache
1. Completely
cache
DRAM
evict tables
from cache
2. Trigger a
single
encryption
Measuring effect of encryption on cache
1. Completely
cache
DRAM
evict tables
from cache
2. Trigger a single
encryption
3. Access
attacker
memory again.
See which
cache sets are
slow
Advantages of second method
• Yields more information (64) from a single
encryption
• Insensitive to timing variance in encryption code
path
• No real need to trigger the encryption – can wait until
it happens by itself
A typical software implementation of AES
char p[16], k[16];
// plaintext and key
int32 T0[256],T1[256],T2[256],T3[256]; // lookup tables
int32 Col[4];
// intermediate state
...
/* Round 1 */
Col[0] T0[p[ 0]©k[ 0]]  T1[p[ 5]©k[ 5]] 
T2[p[10]©k[10]]  T3[p[15]©k[15]];
Col[1] T0[p[ 4]©k[ 4]]  T1[p[ 9]©k[ 9]] 
T2[p[14]©k[14]]  T3[p[ 3]©k[ 3]];
Col[2] T0[p[ 8]©k[ 8]]  T1[p[13]©k[13]] 
T2[p[ 2]©k[ 2]]  T3[p[ 7]©k[ 7]];
Col[3] T0[p[12]©k[12]]  T1[p[ 1]©k[ 1]] 
T2[p[ 6]©k[ 6]]  T3[p[11]©k[11]];
lookup index = plaintext  key
Synchronous attack
• A software service performs AES encryption
using a secret key.
• An attacker process runs on the same CPU.
• The attacker process can somehow invoke the
service on known plaintext.
• Examples:
– Encrypted disk partition + filesystem
– IP/Sec, VPN
Synchronous attack on AES: Overview
• Measure (possibly noisy) cache usage of many encryptions of known
plaintexts.
• Guess the first key byte. For each hypothesis:
–
For each sampled plaintext, predict which cache line is accessed by
“T0[p[ 0]©k[ 0]]”
• Identify the hypothesis which yields maximal correlation between
predictions and measurements.
• Proceed for the rest of the key bytes.
• Practically, a few hundred samples suffice.
Got 64 bits of the key (high nibble of each byte)!
• Use these partial results to mount attack further AES rounds, exploiting
S-box nonlinearity.
A few thousand samples for complete key recovery.
Protection: The Oblivious RAM Model
Oblivious Turing Machine:
• At any point in time know where the heads are
– The access pattern is independent of the
• Important: to convert to circuits
• Get good results for the Cook-Levin Theorem
Oblivious RAM
• The access pattern is independent of the
– Probability distribution!
Suggested by Goldreich 1987
Model
qi
CPU
Small
private
memory
Main memory
M[qi]
Oblivious RAM Requirements
Any sequence of locations i1, i2, …
induces a distribution on sequences of requests
q1, q2…
• Functionality: should be able to figure out the
original content
• Security: for any two sequence of locations i1, i2, …
and i’1, i’2, … induced distributions of requests
should be indistinguishable
Oblivious Ram Constructions
• Trivial: O(n) slowdown
– O(log n) bits private memory
• Known: polylog slowdown [Goldreich-Ostrovsky 96]
– O(log n) bits private memory
Memory Attacks [HSHCPCFAF 08]


Concern: Not only computation leaks information
Memory retains its content after power is lost
5
seconds
30
seconds
60
seconds
5
minutes
http://citp.princeton.edu/memory 59
Memory Attacks [HSHCPCFAF 08]


Not only computation leaks information
Memory retains its content after power is lost
Memory content can even last for several minutes

Recover “noisy” keys
Can use redundancy in round keys
 Cold boot attacks
 Completely compromise popular disk encryption systems
 Reconstruct DES, AES, and RSA keys
http://citp.princeton.edu/memory 60
Public-Key Encryption
Semantic security [GM82] under CPA:
For any m0 and m1 infeasible to distinguish Epk(m0) and Epk(m1)
pk
m0, m1
Output b’
Epk(mb)
(sk, pk)
b à {0,1}
61
Key-Leakage Attacks
Semantic security with key leakage [AGV 09]:
For any* leakage f(sk) and for any m0 and m1 infeasible to
distinguish Epk(m0) and Epk(m1)
pk
Akavia, Goldwasser and
Vaikuntanathan
f
f(sk)
m0, m1
Output b’


Epk(mb)
(sk, pk)
b à {0,1}
Clearly, cannot allow f(sk) that easily reveals sk
For now f : SK ! {0,1}¸ for ¸ < |sk|
62
Is this the right model?

Noisy leakage


Leakage of intermediate values




as opposed to low-bandwidth leakage
Are intermediate values always erased?
Key generation process
Decryption process
Keys generated using a “weak” random source
Not a perfect model, but still a good starting point
Discuss extensions later on
63
What We Know

A generic method for protecting against key-leakage attacks



Chosen-ciphertext key-leakage attacks



Main building block: Hash Proof Systems [CS 02]
Efficient instantiations
 Based on decisional Diffie-Hellman, few exponentiations
A generic CPA-to-CCA transformation
Efficient schemes
Extensions



Noisy leakage
Leakage of intermediate values
Weak random sources
64
Outline of the Talk

Some tools

The generic construction by examples

A simple scheme: ¸ ¼ |sk|/2

Improved schemes: ¸ ¼ |sk|

Extensions of the model

Conclusions, further work, and some rest...
65
Min-Entropy
Probability distribution X over {0,1}n
H1(X) = - log maxx Pr[X = x]
Represents the probability of the most likely value of X
X is a k-source if H1(X) ¸ k
(i.e., Pr[X = x] · 2-k for all x)
Statistical distance:
¢(X,Y) = a |Pr[X=a] – Pr[Y=a]|
66
Extractors
Universal procedure for “purifying” an imperfect source
Definition:
Ext: {0,1}n £ {0,1}d ! {0,1}ℓ is a (k,)-extractor if for any ksource X
¢(Ext(X, Ud), Uℓ) · 
k-source of length n
x
“seed”
d random bits
s
EXT
ℓ almost-uniform bits
67
Strong Extractors
Output looks random even after seeing the seed
Definition:
Ext: {0,1}n £ {0,1}d ! {0,1}ℓ is a (k,)-strong extractor if
Ext’(x, s) = s ◦ Ext(x,s)
is a (k, )-extractor
Leftover hash lemma [ILL 89]:
Pairwise independent hash functions are strong extractors
Example: Ext(x, (a,b)) = first ℓ bits of ax+b over GF[2n]


Output length ℓ = k – 2log(1/)
Seed length d = 2n, almost pairwise independence d = O(log n + k)
68
Decisional Diffie-Hellman
Alice
gx
gy
Bob
Both parties compute K = gxy

DDH assumption:
x,, g
yr,1,gg
z ) r 2)
(g1(g,
, g2g, xg, 1gr,y,gg2rxy) ) (g(g,
,
g
g
g
1
2
1
2
for random x,
g1,y,g2z22GZand
q r, r1, r2 2 Zq
69
Outline of the Lecture

Some tools

The generic construction by examples

A simple scheme: ¸ ¼ |sk|/2

Improved schemes: ¸ ¼ |sk|

Extensions of the model

Conclusions, further work, and some rest...
70
A Simple Scheme


G - group of order q
Ext : G £ {0,1}d ! {0,1} - strong extractor
Key generation



Choose g1, g2 2 G and x1, x2 2 Zq
Let h = g1x1 g2x2
Output sk = (x1, x2) and pk = (g1, g2, h)
MAIN IDEA:
 Redundancy: any pk corresponds to many possible sk’s
x
x
 h=g1 1 g2 2 reveals only log(q) bits of information on sk=(x1,x2)
 Leakage of ¸ bits ) sk still has min-entropy log(q) - ¸
71
A Simple Scheme


G - group of order q
Ext : G £ {0,1}d ! {0,1} - strong extractor
Key generation




Encpk(m)

Decsk(u1, u2, s, e)

Choose g1, g2 2 G and x1, x2 2 Zq
Let h = g1x1 g2x2
Output sk = (x1, x2) and pk = (g1, g2, h)
d
Choose r 2 Zq and a seed s 2 {0,1}
Output (g1r, g2r, s, Ext(hr, s) © m)
Output e © Ext(u1x1 u2x2, s)
u1x1 u2x2 = g1rx1 g2rx2 = (g1x1 g2x2)r = hr
72
A Simple Scheme
Theorem: The scheme is resilient to any leakage of ¸ ¼ log(q) bits
half the
size of sk
Proof by reduction:
Adversary for the
encryption scheme
Distinguisher for
decisional Diffie-Hellman
73