A Multi-Round Side Channel Attack on AES using Belief

A Multi-Round Side Channel Attack on AES using Belief - HAL

A Multi-Round Side Channel Attack on AES using
Belief Propagation
Hélène Le Bouder, Ronan Lashermes, Yanis Linge, Gaël Thomas, Jean-Yves
Zie
To cite this version:
Hélène Le Bouder, Ronan Lashermes, Yanis Linge, Gaël Thomas, Jean-Yves Zie. A MultiRound Side Channel Attack on AES using Belief Propagation. FPS 2016, Oct 2016, Québec,
Canada.
HAL Id: hal-01405793
https://hal.inria.fr/hal-01405793
Submitted on 30 Nov 2016
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0
International License
A Multi-Round Side Channel Attack on AES
using Belief Propagation
Hélène Le Bouder1 , Ronan Lashermes1 , Yanis Linge2 , Gaël Thomas3 , and
Jean-Yves Zie4
1
4
LHS-PEC TAMIS, INRIA, Campus Beaulieu 35042 Rennes, France
2
STMicroelectronics, 190 Avenue Coq 13106 Rousset, France
3
Orange Labs, 44 Avenue de la République 92320 Châtillon, France
CEA-TECH, Centre de Microélectronique de Provence 13120 Gardanne, France
Abstract. This paper presents a new side channel attack to recover a
block cipher key. No plaintext and no ciphertext are required, no templates are built. Only the leakage measurements collected in many different rounds of the algorithm are exploited. The leakage is considered as
a Hamming weight with a Gaussian noise. The chosen target is the Advanced Encryption Standard (AES). Bayesian inference is used to score
all guesses on several consecutive round-key bytes. From these scores a
Belief Propagation algorithm is used, based on the relations of the KeyExpansion, to discriminate the unique correct guess. Theoretical results
according to various noise models are obtained with simulations.
Keywords: Side Channel Analysis, Hamming Weight, AES, Key Expansion,
Multi-Round Attack, Bayesian Inference, Belief Propagation.
1
Introduction
Security is a key component for information technologies and communication.
Even if an encryption algorithm is proved secure mathematically, cryptanalysis
has another dimension: physicals attacks. These attacks rely on the interaction
of the computing unit with the physical environment.
The Side Channel Analyses (SCA) are physical attacks based on observations of the circuit behavior. They exploit the fact that some physical values
(timing, power consumption, electromagnetic emissions (EM)) of a device depend on intermediate values of the computation. This is the so-called leakage of
information of the circuit.
The Advanced Encryption Standard (AES) has been chosen as a target because it is the most widespread block cipher. Yet, our approach would be the
same for any other block cipher.
Motivations: Generally the SCA as in [1,2] links a text with a measurement.
This induces that the attack is often on the first or last round. But the framework
described in [3] suggests that other kinds of attack-path are possible.
1
Our first idea is to draw an attack-path linking a leakage measurement with
another one. Another approach in SCA are template attacks [4,5] which compare
traces from the targeted device with traces from a profiling device.
In this paper, the main motivation was to build an attack which uses only
traces, no text and no template. We are in the case of an attacker who can just
observe a leakage but has no access to the device’s input/output.
The great majority of side-channel attacks published in the literature follows a divide and conquer strategy. In the case of AES, bytes of a round-key
are attacked one at a time. So the other main motivation in our approach is to
attack different round-keys of the AES and use links between them to improve
the probability to find the correct guess.
Contribution: This paper presents a new side channel attack. It is a multiround attack, where no template and no texts (neither plaintexts nor ciphertexts)
are used; only leakage measurements are required. In our attack the leakage is
considered as a Hamming weight with a Gaussian noise. Bayesian inference is
used to obtain scores for the possible values of the different round-key bytes.
Then, the main idea is to use a Belief Propagation (BP) algorithm to cross
information between them, in order to have a key which respects the rules of
KeyExpansion.
Organization of this paper: The paper is organized as follows. The general
context is first introduced in section 2. Our attack is divided in two steps. A first
analysis on each round is described in section 3. Then in section 4, the results
of the analysis of this first step are linked using the BP algorithm. Results are
presented in section 5. Finally the conclusion is drawn in section 6.
2
2.1
Preliminaries
The targeted encryption algorithm: AES
The algorithm: The Advanced Encryption Standard is a standard established by the NIST [6] for symmetric key cryptography. It is a block-cipher.
The encryption first consists in mapping the plaintext T of 128 bits into a twodimensional array of 4 · 4 = 16 bytes called the State. Rows and columns are
respectively noted l and c. Then, after a preliminary xor (the bit-wise xor is
noted ⊕) between the input and the key K0 , the AES executes 10 times a
round-function that operates on the State. The operations used during these
rounds are:
– SubBytes, composed of non-linear transformations: 16 S-boxes noted SB,
working independently on individual bytes of the State.
– ShiftRows noted SR, a byte-shifting operation on each row of the State.
– MixColumns noted M C, a linear matrix multiplication on GF (28 ), working on each column of the State.
– AddRoundKey a xor between the State and the round-key Kr , r ∈ [[0, 10]].
2
The derived key: K denotes the master key. The size of the master key
is 128 bits. Kr is the round-key used at round r, Kr is represented by a twodimensional array of 4·4 bytes, like the State. Krl,c is the round-key byte at row l
and column c. The round-key Kr+1 depends on the round-key Kr with K0 = K.
More precisely, the round-keys are computed with a KeyExpansion function
described by the system of equations (1), where SB is the S-Box function and
Rcon is a constant matrix of size 4 · 10.
l,0
Kr+1 = Krl,0 ⊕ Rcon(l, r) ⊕ SB Krl+1 mod 4,3 ∀l ∈ [[0, 3]]
(1)
l,c−1
l,c
∀l ∈ [[0, 3]] and c ∈ [[1, 3]]
= Krl,c ⊕ Kr+1
Kr+1
2.2
Overview of our attack and state of the art
In this paper, we wanted to build an attack which uses only traces, no text
and no template. In our attack, all the round-keys have already been precomputed, as is the case of most software AES implementations. So using the leakage
of the KeyExpansion as done by Mangard in [7] is impossible, the attacker just
observe leakage from AES round functions.
Actually in the state of the art, there are two kinds of approaches in SCA.
One consists in a divide and conquer strategy to attack one part (e.g. byte)
at a time, as in classical attacks [1,2]. An attacker gives a score (for example a
probability or a correlation) to each key byte guess. The difficulty is to enumerate
all possible round-key guesses in a way that minimizes the rank of the correct
round-key byte. This problem is indeed the focus of many papers as [8,9].
Recently in different works as [10,11,12,13,14], a new method consists in
directly using links between variables of an algorithm. Information of each trace
feeds a BP algorithm or a SAT-solver which usually converges to the correct key.
The first time that BP was used in SCA on AES, was in the attack of VeyratCharvillon et al. [14]. They use BP on the whole AES algorithm to derive a
global template attack.
Grosso et al. [15] compared both approaches and concluded that BP is a little
bit better.
The presented work was made in parallel and independently from the recent
works [14,15]. The strategy was chosen to be applied to a real experimental
attack. Our attack presented is divided in two parts:
1. a divide and conquer attack focuses on key bytes of different rounds (§ 3);
2. linking the information obtained for each byte of every round-key (§ 4).
So, one contribution in our attack is to merge these two approaches and take
advantage of both.
2.3
Attack-path
An attack-path is an exploitable relation between some observables (data or
measurements) and the target, here the key K. Generally the attack-path in SCA
3
as [1,2] links a text with a measurement. In our attack, we want to link two EM
or power leakage measurements. Finally the attack-path is between two rounds
(AddRoundKey of round r and the S-boxes of round r + 1); it is illustrated in
Fig. 1. The attacker has no text at her disposal, she then needs an important
leakage as it is her only source of information. More precisely, the most leaking
functions in AES are MC and SB, so the leakages used in this attack are at the
output of both computations.
MC
X
EM (X)
Kr
SB
Y
EM (Y )
SR
Fig. 1: Attack path
Hence this attack is possible on every rounds except rounds 0 and 10. Indeed
there is no MC before the xor with K0 and there is no SB after K10 .
In the following, K denotes the discrete random variable on a targeted key
byte Krl,c , a guess is noted k and K is the set of guesses k, K = [[0, 255]]. The
correct value is noted k̂ (i.e. Krl,c = k̂).
X denotes the discrete random variable on the byte at the input of AddRoundKey, an event is noted X = x with x ∈ [[0, 255]]. Likewise, Y denotes
the discrete random variable on the byte at the output of SubBytes, an event is
noted Y = y with y ∈ [[0, 255]].
The mathematical model for the leakage is a Hamming weight (HW) with
an additive Gaussian noise. This model is the classic model used in [1,2].
2.4
Theoretical attack-path
Before presenting our attack in practice, the relevance of the theoretical
attack-path is proved; i.e. whether with only pairs (hx , hy ) of Hamming weights
(without noise) it is possible to deduce the value k̂. This approach is similar to
algebraic attacks [16,17,18].
The function HW is not invertible, the set HW −1 (h), whose cardinal depends
on the value of h, is the fiber of h by HW :
HW −1 (h) = {x such that HW (x) = h}
4
.
For the good guess value k̂, the following equation is verified:
SB(k̂ ⊕ x) = y
.
The attacker has only pairs (hx , hy ). For each pair only a subset of guesses
k is possible. Let K(hx ,hy ) be such a subset of guesses:
K(hx ,hy ) = {k such that ∃x ∈ HW −1 (hx ) and HW (SB (k ⊕ x)) = hy }
.
Let K(k̂) be the intersection of the sets K(hx ,hy ) built with all 256 possible values
of x, and the unknown and correct key k̂:
K(k̂) =
255
\
.
K(hx ,hy )
x=0
The correct guess k̂ belongs to K(k̂). Thus, the first natural idea is to use a sieve
to discriminate the wrong guesses.
We have studied all the cases for each key byte value k̂. The sieve is not
enough, because there exists some value k̂ such that one wrong guess is not
discriminated:
K(k̂) = {k̂, k} .
However it can be observed that the sets K(k̂) are all different, as illustrated in
the following example. Besides they can be computed once and for all, for every
possible value of the correct key k̂; for example:
K(25) = {25, 62}
K(62) = {62}
.
If the attacker has used all possible pairs (hx , hy ), she has computed the set
K(k̂). Since all the sets K(k̂) are different, the attacker can discriminate the
correct key k̂. The attack-path is valid.
2.5
Leakage model
Our model for the leakage is a Hamming weight (HW) with an additive
Gaussian noise. In this paper, for a given discrete random variable Z, the discrete
random variable representing the Hamming weight of Z is noted HZ , the event
“the Hamming weight of Z is hz ” is denoted HZ = hz for hz ∈ [[0, 8]]. HZ0 denotes
the continuous random variable representing the “measured” Hamming weight;
an event is noted HZ0 = h0z , with h0z ∈ R such as:
h0z = hz + δ
;
(2)
2
with δ an event of the Gaussian random variable N 0, σZ . For a continuous
random variable HZ0 , FHZ0 denotes its probability density function. The proba
bility density function associated to N 0, σZ 2 is given by:
2 !
1
z
1
√ · exp − ·
.
(3)
FσZ (z) =
2
σZ
σZ · 2π
5
3
Attack on each round-key byte
In practice, the attacker does not have pairs of Hamming weights, but leakage
measurements.
3.1
Points of interest
In order to use observed Hamming weights h0z , the time when they can be
observed in the trace must be identified. The points of interest, denoted PoI, are
a set of points which correspond to the moments of information leakage. The
detection of PoI is a critical point when performing an SCA attack but it is not
the subject of this paper. We consider that PoI can be found using the method
of [19], without a profiling phase.
3.2
Getting observed Hamming weights from physical measures
In this paragraph, the PoI are known and we assume that the noise follows a
normal distribution N (0, σZ 2 ). Physical measures are obtained, as an example,
with an oscilloscope.
The goal of our attack is to succeed without using a template approach, the
attacker may not have a profiling device. In this case, the attacker has to guess
the standard deviation σZ of the noise. A guess
is noted σG . She considers the 9
2
theoretical Gaussian distributions N h, σG
centered in the different Hamming
weights h ∈ [[0, 8]]. They are added to create a new distribution from which a
new standard deviation σH is computed:


X 8
N (h, σG ) .
σH = std 
h
h∈[[0,8]]
She computes the mean M of the measured values M = (mi )1≤i≤n at a PoI.
Then the observed hamming weights are computed as follow:
h0i = mi − M ·
σH
+4
std(M )
.
If the attack succeeds, σG is a good approximation.
3.3
Bayesian Inference
In this part, the goal is to build a probability for each guess k given a set of
0
measurements of n pairs (HX
, HY0 ) = (h0x , h0y ) that a round-key byte K equals
k. The main idea is to study the joint probability. This kind of approach is used
in stochastic attacks[20] or in the attack of Linge et al. [19].
Throughout the paper, the following relations are used. A set of n pairs
(hx , hy ) is denoted {(hx , hy )}n , the i-th pair is denoted (hx , hy )i . Likewise a set
6
of n pairs (h0x , h0y ) is denoted {(h0x , h0y )}n and the i-th pair is denoted (h0x , h0y )i .
0
The probability Ak for a guess k given the measurements (HX
, HY0 ), (4) is
defined as follow.
Ak = Pr K = k|{(h0x , h0y )}n
.
(4)
The context can be represented with a belief network (as in [21]). It is a graph
where the nodes are variables as illustrated in Fig 2.
X
K
Y
HX
HY
0
HX
HY0
Fig. 2: Modeling the problem with a graph. An arrow means influence between
two variables. The variables in the rectangle have a different value at each execution, while the value of the variables outside the rectangle is fixed throughout
the attack.
At the start of the attack all guesses are equiprobables, the prior distribution
is uniform:
1
∀k ∈ K, Pr [K = k] =
.
(5)
256
Probabilities Pr [(HX , HY ) = (hx , hy )|K = k] can be precomputed once and for
all by enumeration on the value x.
So the attacker wants to evaluate the probability of Ak , given by equation (4),
i.e. the probability of K = k given a set of measurements. The Bayes theorem
implies:
A1k
z
}|
{
0
0
0 ,H 0 ) {(h , h )}n |K = k · Pr[K = k]
F(HX
x
y
Y
Ak =
0
0
0 ,H 0 ) {(h , h )}n
F(HX
x
y
Y
|
{z
}
.
A0k
The denominator A0k can be obtained by normalization, there is no need to
0
compute it. The pairs (HX
, HY0 )i are independent and identically distributed;
i.e. all the pairs have the same distribution of probabilities and all the pairs are
0
mutually independent. It means that a pair (HX
, HY0 )1 cannot be predicted with
7
0
the previous pair (HX
, HY0 )2 ; thus:
A1k =
n
Y
0
0
0 ,H 0 ) (h , h )i |K = k
F(HX
x
y
Y
{z
}
i=1 |
.
A2k
Now, the probability of a single pair is needed.
0
0
0 ,H 0 ) (h , h )|K = k
A2k = F(HX
x
y
Y
.
The law of total probability implies that:
X
0
0
0 ,H 0 ) (h , h )|(hx , hy ) · Pr [(hx , hy )|K = k]
A2k =
F(HX
x
y
Y
|
{z
}
(h ,h )
x
y
.
A3k
0
0
0 ,H 0 ) (h , h )|(hx , hy )
A3k = F(HX
x
y
Y
0
The pair (HX , HY ), the variable HX
and the variable HY0 are independent. For
0
a fixed hx , HX and HY are independent, thus:
0
0
0 (h |(hx , hy )) = FH 0 (h |HX = hx )
FH X
x
x
X
.
Likewise, for a fixed hy , HY0 and HX are independent, thus:
FHY0 h0y |(hx , hy ) = FHY0 h0y |HY = hy
.
Thus:
0
0 (h |HX = hx ) · FH 0
A3k = FHX
h0y |HY = hy
x
Y
.
0
0 (h |HX = hx ) follows the normal distribution centred in hx , so:
But FHX
x
0
0 (h |HX = hx ) = Fσ
FH X
(h0x − hx )
x
X
.
FHY0 h0y |HY = hy = FσY h0y − hy
.
Likewise:
Finally, the probability Ak , that a round-key byte K equals k for some given
0
measurements of (HX
, HY0 ), is proportional to the product 5 :
Ak ∝
n
Y
X
FσX h0x,i − hx · FσY h0y,i − hy · Pr [(hx , hy )|K = k]
. (6)
i=1 (hx ,hy )
Note that, in the previous equation, the Gaussian noise hypothesis can be
relaxed by replacing the Gaussian probability density functions (FσX and FσY )
by whatever probability density function the attacker can come up with.
At the end of this part, the attacker has a probability for each guess k on
every key byte of round 1 to 9.
5
h0x,i is the i-th measurement h0x .
8
4
4.1
Crossing information from round-key bytes with BP
Goal
The round-key bytes are linked by KeyExpansion relations (1). There are
16 · 9 round-key bytes linked together using 16 · 8 equations. It is supposed that
the correct key is the one minimizing the ranks of its round-key byte values
across all 9 rounds. In this part, this additional information is crossed with the
estimations to improve the probabilities Pr[K = k] and to have a key which
respects the rules of KeyExpansion. To this end, in this part a technique known
as Belief Propagation (or sum-product algorithm) [22] is used.
BP was first used by Gallager [23] for decoding low-density parity-check
(LDPC) codes. It was then rediscovered by Tanner [24] and formalized by Pearl [25].
The first time that BP was used in SCA on AES, in the attack et al. [14], then
it is studied in [15].
4.2
Factor Graph
The BP algorithm relies on a bipartite graph called a factor graph (or Tanner
graph). To each node in the factor graph is associated some information.
l,1
Er+1
l,0
Kr+1
Krl,1
l,0
Er+1
Krl+1,3
Erl,1
l,1
Kr−1
Krl,0
S
Erl+1,3
l,1
Er−1
Erl,0
l+1,3
Kr−1
l,0
Kr−1
S
Fig. 3: Part of the factor graph associated with the AES KeyExpansion. Circles
are variable nodes (round-key bytes) and squares are factor nodes (equations).
Equations are labeled using the same indexes as the round-key byte they define,
i.e. equation Erl,c is the equation used to create Krl,c . The S-labeled edges remind
the use of the S-box in the equation for that particular byte.
The nodes of a factor graph are of two kinds:
– variable nodes, in our case representing round-key bytes;
– factor nodes, in our case representing equations used in the KeyExpansion.
9
An edge links a variable node with a factor node, when the equation represented
by the factor node involves the byte represented by the variable node. A part of
the factor graph associated with the KeyExpansion is illustrated in Fig. 3.
In the following, the notation N (·) applied to a node is used to denote the
set of neighbours of that node. Thus N (K) is the set of equations involving
round-key byte K, and N (E) is the set of round-key bytes composing equation
E. Finally, a factor node E “is satisfied” when the corresponding KeyExpansion
equation is satisfied.
4.3
Algorithm
Algorithm 1 Belief Propagation Algorithm.
Inputs: Experimental distributions Ak of every round-key byte K; m the maximal
number of iterations of BP.
Outputs: Final distributions Bk deduced using KeyExpansion relations.
for all round-key byte K and value k and equation E do
µK→E (k) = Ak .
end for
for j = 1 to m do
for all round-key byte K and value k and equation E do
X
µE→K (k) =
E(k, k1 , k2 ) · µK1 →E (k1 ) · µK2 →E (k2 )
(k1 ,k2 )∈K2
with N (E) \ {K} = {K1 , K2 }.
end for
for all round-key byte K and value k and equation E do
Y
µK→E (k) ∝ Ak
µE1 →K (k).
E1 ∈N (K)\{E}
end for
end for
for all round-key byte K and value k do
Y
Bk ∝ Ak
µE→K (k).
E∈N (K)
end for
The BP algorithm in the general case is summed up in Algorithm 1.
In our case, the input of BP are the probabilities Ak found in (6). For a
key byte K, BP computes Bk a better belief , from the initial value Ak . As already stated, nodes in the factor graph exchange information messages with their
10
neighbours. More precisely, since the graph is bipartite, two types of messages
are exchanged:
– variable to factor messages between a variable node (key byte) K and a factor
node (equation) E, denoted µK→E (k);
– factor to variable messages between a factor node E and a variable node K,
denoted µE→K (k).
Bk is computed according to the input probability Ak and to the probabilities
Pr[K = k|E] conditional on factor node E in N (K) to be satisfied using the
following equation:
Y
B k ∝ Ak
Pr[K = k|E] .
(7)
E∈N (K)
N (E) \ {K} = {K1 , K2 }. Thanks to the law of total probability, Pr[K = k|E]
can the be obtained by summing Pr[K1 = k1 ] · Pr[K2 = k2 ] over all the possible
values for k1 and k2 such that factor node E is satisfied. Thus, the following
equation holds:
X
Pr[K = k|E] =
E(k, k1 , k2 ) · Pr[K1 = k1 ] · Pr[K2 = k2 ] .
(8)
(k1 ,k2 )∈K2
Pr[K1 = k1 ] and Pr[K2 = k2 ] are needed to compute Pr[K = k|E] which depends
on:
E ∈ N (K) ∩ N (K1 ) ∩ N (K2 ) .
Hence, using Equation (7) directly on K1 and K2 would create a self-convincing
loop for node K. To avoid that problem, the factor corresponding to node E is
removed from the product in Equation (7) in that case:
Y
Pr[K1 = k1 ] ∝ Ak1
Pr[K1 = k1 |E1 ] .
(9)
E1 ∈N (K1 )\{E}
However, it can be shown [22,23] that the equations (7), (8) and (9) do
not hold in general because they require an independence assumption on the
probabilities used in the different products. In [26], authors show that in practice,
the equation can be replaced by approximation, the BP gives excellent results.
So, the equations (7), (8), and (9) are respectively replaced by:
Bk ∝ Ak
Y
µE→K (k)
(10)
E∈N (K)
X
µE→K (k) =
E(k, k1 , k2 ) · µK1 →E (k1 ) · µK2 →E (k2 )
(11)
(k1 ,k2 )∈K2
µK→E (k) ∝ Ak
Y
µE1 →K (k)
E1 ∈N (K)\{E}
11
(12)
To complete the description of the BP algorithm, an initialization step is done
before applying the above equations. The variable to factor messages µK→E (k)
are initialized with the prior probabilities Ak corresponding to round-key byte K.
In summary, after an initialization phase, BP works by alternatively applying
equations (11) then (12) for every edge (K, E) in the graph. At the end of the
execution, the returned value Bk is computed using equation (10). The number
of iterations is not precisely defined but BP converges rapidly.
At the end, the attacker deduces from the BP outputs Bk , 9 probable roundkeys. The attack succeeds if one of these 9 probable round-keys is an actual
round-key, i.e. it is derived from the correct master key using the KeyExpansion.
Finally it is interesting to note that using BP for enhancing the probabilities
on the different round-keys would work better as the number of rounds increases.
Indeed, each new round brings independent information on the key that can be
crossed with all other rounds.
5
5.1
Results
Simulation results
(v0.4). They
The simulations are done with the programming language
would correspond to an attack against a typical unprotected 8-bit software implementation of AES. Plaintexts are randomly generated. Measured Hamming
weights are simulated with a noise according to N (0, σ 2 ) for various standard
deviations σ. To facilitate the simulation the noise is supposed to be the same
on X and Y : σ = σX = σY .
We emphasize that to overcome floating point arithmetic issues, the normalization steps in both the Bayesian attack and the BP algorithm are not
performed. As such, we work with scores corresponding to the logarithm of
probabilities instead of probabilities directly.
Simulation to retrieve a key byte from pairs of noisy Hamming
weights: First, at the level of a single byte using Bayesian inference (§ 3.3).
For different noise standard deviation σ and different numbers of traces n, the
average rank of the good key byte has been computed, for 100 simulated attacks
for each possible value of the key k̂. The results are displayed in Table 1.
12
Table 1: Average rank of the good key byte k̂ according to the noise standard
deviation σ and the number of traces n, for 100 simulated attacks for each
possible value of the key k̂.
n\σ
100
1000
10000
100000
0.1
1.2
1
1
1
0.2
1.3
1
1
1
0.3
2.3
1
1
1
0.5
14
1
1
1
1.0
66
7.1
1
1
1.5
96
35
2.2
1
2.0
107
66
12
1.1
3.0
119
97
48
7.3
Using BP on simulation results: Now, the attack on the whole master key
is simulated to see the additional benefit of BP. The algorithm returns 9 roundkeys. The measure used here is then the minimum of the Hamming distances
between the guessed round-keys and the correct round-keys, where the minimum
is taken over the nine round-keys. The results are summarized in Table 2. As it
Table 2: Hamming distance between the best key found by BP and the correct
master key K according to the noise standard deviation σ and the number of
traces n, estimated over 100 simulated attacks.
n\σ
100
1000
10000
100000
0.1
0
0
0
0
0.2
0
0
0
0
0.3
0
0
0
0
0.5
0
0
0
0
1.0
59
0
0
0
1.5
51
39
0
0
2.0
53
46
0
0
3.0
54
51
40
0
can be seen, the results are sharply separated, either the attack always succeeds
or it always fails completely. Nonetheless, the number of traces required for the
attack to succeed using BP is an order of magnitude below of what is required
without BP. Finally the improvement of BP on the the attack is illustrated in
the Table 3.
Table 3: Success of the attack according to the noise standard deviation σ and
the number of traces n, for 100 simulated attacks. X indicates the attack always
succeeds even if not using BP, X indicates the attack succeeds only with using
BP and × indicates the attack fails.
n\σ
100
1000
10000
100000
0.1
X
X
X
X
0.2
X
X
X
X
0.3
X
X
X
X
0.5
X
X
X
X
13
1.0
×
X
X
X
1.5
×
×
X
X
2.0
×
×
X
X
3.0
×
×
×
X
Using the BP algorithm considerably improves the success rate. It makes it
possible to reduce the number of traces required for the attack to succeed. For
example, when σ is equal to 0.1, only 100 traces are required thanks to BP, as
opposed to 1000 traces without BP.
6
Conclusion
This paper presents a new side channel attack targeting the AES key. The
first motivation for this paper was to realize an attack without texts and without
templates, using only leakage measurements. The leakage is considered as a
Hamming weight with an additive Gaussian noise. On each round 1 to 9 of
the AES, two points of leakage are required to define the attack path without
any text.
First, with a Bayesian inference approach a score is assigned to each roundkey byte for all rounds from 1 to 9. Then, the second step is to use the KeyExpension rules to aggregate the knowledge on the round-key bytes to discriminate
the correct key. A belief propagation is used for that purpose.
Simulation results have shown that the attack is effective, using the BP algorithm is a very good way to enhance the chances to recover the key. Even
in the presence of a strong noise the attack can succeed. The BP algorithm
approach can be used in combination with any other attack able to score all
round-key bytes on several consecutive rounds. Additionally, it shows that increasing the number of rounds in a crypto-algorithm in order to make it resist
classical cryptanalysis can weaken it with respect to our attack.
Finally, we would like to explore if masked implementations are effectively
protecting against this attack.
Acknowledgment
The authors would like to thank Christophe Clavier (University of Limoges, France), Guillaume Reymond (Tiempo, France), Assia Tria (CEA-TECH,
France), and Antoine Wurker (Eshard, France) for their valuable contributions
to the development and understanding of the issues discussed in the paper.
This work was initiated while Hélène Le Bouder was at École des Mines de
Saint-Étienne. This work was partially funded by the French National Research
Agency (ANR) as part of the program Digital Engineering and Security (INS2013), under grant agreement ANR-13-INSE-0006-01 and by the French DGCIS
(Direction Générale de la Compétitivité de l’Industrie et des Services) through
the CALISSON 2 project.
14
References
1. Paul C. Kocher, Joshua Jaffe, and Benjamin Jun. Differential Power Analysis. In
CRYPTO ’99, volume 1666 of LNCS, pages 388–397. Springer.
2. Eric Brier, Christophe Clavier, and Francis Olivier. Correlation Power Analysis with a Leakage Model. In CHES 2004, volume 3156 of LNCS, pages 16–29.
Springer, 2004.
3. Hélène Le Bouder, Ronan Lashermes, Yanis Linge, Bruno Robisson, and Assia
Tria. A Unified Formalism for Physical Attacks. IACR Cryptology ePrint, 2014.
4. Neil Hanley, Michael Tunstall, and William P. Marnane. Unknown Plaintext Template Attacks. In WISA 2009, volume 5932 of LNCS, pages 148–162. Springer.
5. Cédric Archambeau, Eric Peeters, François-Xavier Standaert, and Jean-Jacques
Quisquater. Template Attacks in Principal Subspaces. In CHES 2006, volume
4249 of LNCS, pages 1–14. Springer.
6. NIST. Specification for the Advanced Encryption Standard. FIPS PUB 197, 2001.
7. Stefan Mangard. A simple power-analysis (spa) attack on implementations of the
aes key expansion. In ICISC 2002, pages 343–358. Springer.
8. Nicolas Veyrat-Charvillon, Benoı̂t Gérard, Mathieu Renauld, and François-Xavier
Standaert. An Optimal Key Enumeration Algorithm and Its Application to SideChannel Attacks. In SAC 2012, volume 7707 of LNCS, pages 390–406. Springer.
9. Sonia Belaı̈d, Jean-Sébastien Coron, Pierre-Alain Fouque, Benoı̂t Gérard, JeanGabriel Kammerer, and Emmanuel Prouff. Improved Side-Channel Analysis of
Finite-Field Multiplication. In CHES 2015, volume 9293 of LNCS, pages 395–415.
Springer.
10. Daniel P Martin, Jonathan F O’Connell, Elisabeth Oswald, and Martijn Stam.
Counting Keys in Parallel After a Side Channel Attack. In ASIACRYPT 2015,
pages 313–337. Springer.
11. Andrey Bogdanov, Ilya Kizhvatov, Kamran Manzoor, Elmar Tischhauser, and
Marc Witteman. Fast and memory-efficient key recovery in side-channel attacks.
IACR Cryptology ePrint, 795, 2015.
12. Benoı̂t Gérard and François-Xavier Standaert. Unified and optimized linear collision attacks and their application in a non-profiled setting. In CHES 2012, pages
175–192. Springer.
13. Xin Ye, Thomas Eisenbarth, and William Martin. Bounded, yet sufficient? How
to determine whether limited side channel information enables key recovery. In
Smart Card Research and Advanced Applications, pages 215–232. Springer, 2014.
14. Nicolas Veyrat-Charvillon, Benoı̂t Gérard, and François-Xavier Standaert. Soft
Analytical Side-Channel Attacks. In ASIACRYPT 2014, pages 282–296.
15. Vincent Grosso and François-Xavier Standaert. ASCA, SASCA and DPA with
Enumeration: Which One Beats the Other and When? In ASIACRYPT 2015,
pages 291–312. Springer.
16. Nicolas Courtois. How Fast can be Algebraic Attacks on Block Ciphers? In Symmetric Cryptography, volume 07021 of Dagstuhl Seminar Proceedings, 2007.
17. Harris Nover. Algebraic Cryptanalysis of AES: An Overview. University of Wisconsin, USA, 2005.
18. Nicolas T. Courtois and Gregory V. Bard. Algebraic Cryptanalysis of the Data
Encryption Standard. In Cryptography and Coding 2007, volume 4887 of LNCS,
pages 152–169. Springer.
19. Yanis Linge, Cécile Dumas, and Sophie Lambert-Lacroix. Using the Joint Distributions of a Cryptographic Function in Side Channel Analysis. In COSADE 2014,
pages 199–213. Springer.
15
20. Werner Schindler, Kerstin Lemke, and Christof Paar. A Stochastic Model for
Differential Side Channel Cryptanalysis. In CHES 2005, volume 3659 of LNCS.
Springer, 2005.
21. David Barber. Bayesian Reasoning and Machine Learning. Cambridge University
Press, 04-2011 edition, 2011.
22. Frank R. Kschischang, Brendan J. Frey, and Hans-Andrea Loeliger. Factor graphs
and the sum-product algorithm. IEEE Trans. on Information Theory, 47(2):498–
519, 2001.
23. Robert G. Gallager. Low-density parity-check codes. IRE Trans. on Information
Theory, 8(1):21–28, 1962.
24. Robert M. Tanner. A recursive approach to low complexity codes. IEEE Trans.
on Information Theory, 27(5):533–547, 1981.
25. Judea Pearl. Reverend Bayes on Inference Engines: A Distributed Hierarchical
Approach. In National Conference on Artificial Intelligence 1982, pages 133–136.
AAAI Press.
26. Sae-Young Chung, G. David Forney Jr., Thomas J. Richardson, and Rüdiger L.
Urbanke. On the design of low-density parity-check codes within 0.0045 dB of the
Shannon limit. IEEE Communications Letters, 5(2):58–60, 2001.
16

Download Report

A Multi-Round Side Channel Attack on AES using Belief - HAL

Paperzz.com

Your Paperzz