5
Probability Theory and Symmetric Cryptanalysis
We summarise some basic facts in probability theory. Don’t panic about the maths. You will not be directly examined
on probability theory.
Probability distribution: Let X be a finite set and, for each x ∈ X let px ∈ R be such that 0 ≤ px ≤ 1 and
�
px = 1.
x∈X
The uniform distribution on a set X of size n is the probability distribution where every element has probability 1/n.
Examples: A fair coin gives a probability on the set X = {H, T } with pH = 12 and pT = 12 . A fair dice
corresponds to the set X = {1, 2, 3, 4, 5, 6} and p1 = p2 = · · · = p6 = 16 . These are both uniform distributions.
A biased coin might fall heads 34 of the time and tails 14 of the time, so this would be X = {H, T } and pH = 34 ,
pT = 14 .
�
For any set A ⊆ X the probability of A is p[A] = a∈A pa . For example, the probability of a dice throw being
an odd number is
p[{1, 3, 5}] = 16 + 16 + 16 = 12 .
For sets A, B ⊆ X we have
p[A ∪ B] = p[A] + p[B] − p[A ∩ B].
In particular, if A ∩ B = ∅ then p[A ∪ B] = p[A] + p[B].
We define the product distribution to be a probability on X × X with p(x1 ,x2 ) = px1 px2 . For example, let
X = {H, T } be the coin toss. The product probability is on the set X × X = {(H, H), (H, T ), (T, H), (T, T )} with
all probabilities equal to 12 · 12 = 14 .
Conditional probability: (We will not use this much.) Let B ⊆ X and a ∈ X. Then
�
P [a]/P [B] if a ∈ B,
P [a|B] =
0
otherwise.
For example, for the dice, the probability that the dice reads 3, given that it is an odd number is (1/6)/(1/2) = 1/3
and the probability that the dice reads 2, given that it is an odd number is 0.
For A ⊆ X we define
�
P [A|B] =
p[a|B].
a∈A
Note that P [A|B] = P [A ∩ B]/P [B].
Suppose B1 , . . . , Bn are disjoint subsets of X whose union is X. Let A ⊆ X. Then
P [A] =
n
�
P [A|Bi ]P [Bi ].
i=1
Example: We consider two famous questions about the probability distribution X = {B, G} with pB = pG = 12 ,
which is the probability that a child born to a mother is a boy or girl. If two children (not twins) are born then we
consider the product distribution X × X = {(B, B), (B, G), (G, B), (G, G)}. Then the probability of having two
girls is 1/4.
1. Mrs. Jones has two children. The older child is a girl. What is the probability that both children are girls?
2. Mrs. Jones has two children. One of her children is a girl. What is the probability that both children are girls?
5
One-time pad: Fix any probability distribution on the message space. So message m ∈ M appears with probability pm . Suppose the key is sampled from a uniform distribution and a ciphertext c ∈ C is received. Then the
probability that the message is m, conditioned on the ciphertext being c, is pm . This is what “perfect security” means.
Output bias in RC4: We will explain that the second byte output in RC4 has probability close to 2/256 of being
zero, whereas in a uniformly random stream of bytes then this probability would be 1/256. This shows that the second
byte of RC4 is biased. It is not safe to use the one-time pad for encryption when the keystream is biased.
Assume a completely random set-up stage and write z1 , z2 , . . . for the output bytes of RC4. The state array S is
modelled as the product distribution of 256 copies of the uniform distribution on bytes. We will show that if S[2] = 0
and S[1] �= 2 then z2 = 0, and if S[2] �= 0 or S[1] = 2 then z2 is random (and so has probability 1/256 of being zero).
Hence, the probability that z2 = 0 is
�
�
�
�
1
Pr S[2] �= 0 or S[1] = 2 .
1 Pr S[2] = 0 and S[1] �= 2 + 256
Brute force attack on a cipher: Suppose one has a few plaintext-ciphertext pairs (mi , ci ) for a cipher. So that
ci = Enc(k, mi ). A brute-force attack is to try all values k � for the key and compute Enc(k � , m1 ) to see if it is equal
to c1 . If so, then check Enc(k � , mi ) = ci for some more values just to be sure.
If the key is uniformly chosen from a set of n-bit binary strings then, in the worst-case, you need to try 2n guesses
for the key. If the keys are sampled from a biased probability distribution then the key search may require less work.
On average, it is necessary to try 2n−1 guesses. This is why key lengths are usually chosen to have at least 64 bits, and
these days typically 128 bits or more.
Alternative approach: For a chosen-message m∗ compute a table of Enc(k, m∗ ) for all keys. This is usually
done using a hash table storing (Enc(k, m∗ ), k). When we wish to attack a user we just need to get a user to encrypt
m∗ and then look up in the table to deduce k. The main problem is that the table requires enormous storage.
Double DES: Since DES only has a 56-bit key it is vulnerable to a brute-force attack. So one idea is to choose
two keys k1 , k2 and construct the cipher
Enc(k1 , k2 , m) = DES(k2 , DES(k1 , m)).
You might think this is a good idea, but actually it does not add more security due to the following meet-in-the-middle
attack: Given some pairs (mi , ci = Enc(k1 , k2 , m)) the attacker does the following. First, try all keys k and compute
a hash table of pairs (DES(k, m1 ), k). Then try all keys k � and compute DES−1 (k � , c1 ) and see if the value matches
a value in the hash table. The running time of the attack is 256 + 256 = 257 (plus storage proportional to 256 ), not
256 · 256 = 2112 .
Triple DES: A more secure idea is to choose two keys k1 , k2 and construct the cipher
Enc(k1 , k2 , m) = DES(k1 , DES−1 (k2 , DES(k1 , m))).
The reason for using DES decryption in the middle is for backwards compatibility: Setting k2 = k1 means that triple
DES becomes the original DES scheme, and so one implementation can perform both functions.
Another important class of attacks on block ciphers is differential cryptanalysis. We will not discuss this in the
course, but see Section 4.1 of Vaudenay’s book if you are interested.
6
6
Hash functions, MACs and authentication
We now discuss some cryptographic primitives that provide message integrity. Message authentication codes (MACs)
use a shared key and allow two users to authenticate to each other (later in the course we discuss digital signatures,
which are a more general solution). Hash functions are used to provide integrity: One can compute a hash of a
document m and it should be infeasible for an attacker to find another document m� that has the same hash value.
NIST standardised the secure hash algorithm (SHA-1) in 1995 and it was extremely widely used until some major
cryptanalysis in 2005 (and the first collision in 2017). In 2001 NIST developed the SHA2 family (SHA-256, SHA385 and SHA-512) and these are still used. In 2006 NIST initiated a competition to choose a new hash function.
The winner was chosen in 2012 (Keccak, designed by Guido Bertoni, Joan Daemen, Michal Peeters, and Gilles Van
Assche) and named SHA-3.
Message authentication code (MAC): (Boneh-Shoup Chapter 6; Smart “Crypto made simple” Chapter 14 or
older book Chapter 10; Martin Chapter 6) A MAC system consists of two algorithms Sign and Verify. The algorithm
Sign (possibly randomised) takes a key k and a message m and outputs a tag t. The algorithm Verify takes k, m and t
and outputs 0 (reject) or 1 (accept). We require that
Verify(k, m, Sign(k, m)) = 1.
An attacker against a MAC is an algorithm A that plays the existential MAC forgery under a chosen message
attack game. Such an attacker can choose messages mi and gets back ti = Sign(k, mi ). The attacker then has to
output a pair (m, t) �= (mi , ti ) such that Verify(k, m, t) = 1 with high probability.
We stress that a MAC is not required to provide secrecy of data. It is not required to be hard to deduce m when
given t. Similarly, encryption alone does not necessarily provide integrity.
CBC-MAC: We now give a general construction of a MAC from a block cipher with n-bit blocks. The main idea
is for the MAC to be the final block of the encryption of the message in CBC mode, however there are a nunber of
subtleties. First, let the arbitrarily-long message m be written as the sequence x1 , x2 , . . . , xl of n-bit blocks, where
the message is padded so that the final block xl has exactly n bits (the way this is done is important). Set t to initially
be the all zero binary string of length n and, for i = 1, 2, . . . , l do t = Enc(k, t ⊕ xi ). Then the MAC tag is t. The tag
is verified by re-computing it.
CBC-MAC provides some protection against modification/insertion/deletion of message blocks, but it also has
some weaknesses.
Some simple attacks on CBC-MAC: Here m1 �m2 denotes concatenation of binary strings.
1. Let m1 and m2 be messages whose length is an exact number of blocks. Suppose one has correct MAC tags
t1 , t2 for m1 and m2 . Let B be an n-bit block and let B � = B ⊕ t1 ⊕ t2 . Suppose one has the tag t3 for the
messsage m1 �B � . Then t3 is also the tag for m2 �B.
2. Let m1 and m2 = x1 �x2 � · · · �xl be two messages (each an exact number of blocks). Get MAC tag t on m1 .
Set m�2 = (t ⊕ x1 )�x2 � · · · �xl and let t� be the MAC tag for m�2 . Then t� is the MAC tag for m1 �m2 .
Authenticated encryption: (Chapter 9 of Boneh-Shoup; Section 6.3.6 of Martin) Suppose Alice and Bob have a
shared key k. They can use k to encrypt data between them, but Bob can never be certain that the data is coming from
Alice. Now suppose Alice and Bob share a pair of keys (kMAC , kENC ). There are two obvious solutions:
1. MAC then encrypt: Compute t = Sign(kMAC , m) and then Enc(kENC , m�t).
2. Encrypt then MAC: Compute c = Enc(kENC , m) and send (c, Sign(kMAC , c)).
The receiver then can decrypt the message and verify the MAC (or vice versa). There are subtleties with these models
and only Encrypt-then-MAC is always secure. Authenticated encryption is a hot topic and schemes are still being
proposed and standardised at the present time. Some popular choices include GCM, CCM, EAX, OCB etc.
7
Hash functions: (Chapter 8 of Boneh-Shoup; Chapter 14/10 of Smart; Chapter 6 of Martin) Hash functions
are a standard notion in computer science: they map arbitarily long bitstrings to bitstrings of some finite length. A
cryptographic hash function is the same, but is required to satisfy stronger security properties. We denote the set of
all finite-length bitstrings as {0, 1}∗ . A hash function is a function h : {0, 1}∗ → {0, 1}n . Note that hash functions
are always “many-to-one”.
Security properties of hash functions:
1. Preimage Resistant: Given y ∈ {0, 1}n it should be hard to find x ∈ {0, 1}∗ such that h(x) = y.
2. Second Preimage Resistant: Given x it should be hard to find x� �= x such that h(x� ) = h(x).
3. Collision Resistant: It should be hard to find x and x� with x� �= x and h(x� ) = h(x).
�
Birthday paradox: Suppose one samples n-bit strings uniformly at random. Then on average, after π2n /2
strings have been sampled, there will be a collision (i.e., the same string sampled twice). Hence if one chooses around
2n/2 random messages xi and computes and stores the values h(xi ) then one expects a collision. For this reason, the
output length of a hash funtion should be chosen to be at least twice the exponent of the largest possible brute-force
attack.
Merkle-Damgård Construction: A compression function is a function f : {0, 1}s → {0, 1}n , where s > n.
If f is a collision-resistant compression function then one can build a hash function as follows: Pad the message m
so that it is a sequence x1 , . . . xl of blocks of s − n bits. Set h to be some fixed initial value IV . Set h = f (h�xi )
for i = 1, 2, . . . , l and output h. The construction is not fully secure in this form, and so some additional length
strengthening is applied.
The function SHA-256 is a Merkle-Damgård function where s = 768 and n = 256. MD5 is also a MerkleDamgård function.
Theorem: If f is collision resistant then h is collision-resistant.
Sponges and SHA-3: (Section 8.8 of Boneh-Shoup) Instead of the Merkle-Damgård construction, SHA-3 is based
on the concept of a sponge, which allows arbitrary length input and arbitrary length output. We need a permutation
F : {0, 1}n → {0, 1}n and write n = r + c (here c is called the “capacity” and r is the “rate”). Suppose the output
length is v bits. The message m is padded and then broken into r-bit blocks x1 , . . . , xl . The blocks are then extended
by zeroes to have x�i = xi �0c . The absorbing stage of the hash function starts by initialising h = 0n and then iterates
h = F (h ⊕ x�i ) for i = 1, 2, . . . , l. The squeezing stage is to output the first v bits of h. If v > r then, for simplicity,
we suppose v = tr for some integer t: Output the first r bits of h and then set h = F (h) and repeat (always outputting
the first r bits) t − 1 times until tr bits in total are output.
SHA-3 is based on a permutation Keccak from {0, 1}1600 to itself (this is represented as a 5 × 5 array of 64-bit
words). The SHA-3 standard allows four possible output lengths: 224, 256, 384, 512 bits. The corresponding values
for c are 448, 512, 768, 1024 (and so r = 1152, 1088, 832, 575 respectively).
HMAC: (Section 8.7 of Boneh-Shoup; Section 6.3.4 of Martin) The problem is to get a MAC from a hash function
h (the difference is that a MAC has a key). A simple idea is to let the key k be some n-bit string and define the MAC
by t = Sign(k, m) = h(k�m). However, if h is built using the Merkle-Damgård construction then this is not secure
as given a MAC tag on m then it may be possible to compute the MAC tag for messages m�m� . Instead, the idea of
HMAC is to choose two keys k1 , k2 and define the tag to be
h(k2 �h(k1 �m)).
The HMAC standard chooses k1 = k ⊕ ipad and k2 = k ⊕ opad. Note that the “outer” application of h is on a short
input; only the “inner” hash function is applied to the full message.
Other hash systems The (early) Unix password system can be thought of as hashing the password, in which case
the hash function is built by iterating DES.
8
7
Public key cryptography
Public key cryptography was developed in the early 1970s at GCHQ by Ellis, Cocks and others. In the mid 1970s
academic cryptographers also developed the main ideas, particularly Merkle, Diffie and Hellman, and followed by
Adleman, Rivest and Shamir.
The concept is to allow Alice and Bob to communicate securely without having a shared key. Bob has a public key
and a private key. To send a message to Bob it is only necessary to know his public key. To decrypt the message one
uses the private key. It must be computationally infeasible to deduce the private key from the public key.
Some number theory: Let p > 2 be a prime. Let a be an integer that is not a multiple of p. One can compute
powers of a modulo p. For example, the powers of a = 4 modulo p = 19 are a2 = 16, a3 = 7, a4 = 9, a5 = 17 etc.
An important fact (this is “Fermat’s little theorem”) is
ap−1 ≡ 1
(mod p).
Example: Compute 21000 (mod 7) and 31001 (mod 11).
One-way function: Given a, n, p it is efficient to compute c = an (mod p) using the square-and-mutiply algorithm. Given (a, c, p) it is not efficient to compute an integer n such that c = an (mod p); this is called the discrete
logarithm problem (DLP).
Example: Compute 217 (mod 37).
Example: Find n such that 3n ≡ 6 (mod 23).
Generating primes: It is relatively efficient to generate large primes because:
1. Primes are relatively common;
2. Primality testing is efficient.
Indeed, Fermat’s little theorem is the basis for most primality tests.
Diffie-Hellman Key Exchange: Alice and Bob wish to agree on a shared random key (so that they can do symmetric crypto even though they have never met). The system parameters are a prime p and an element g.
Alice chooses an integer a, computes c1 = g a (mod p) and sends c1 to Bob.
Bob chooses an integer b, computes c2 = g b (mod p) and sends c2 to Alice.
Alice receives c2 and computes k = ca2 (mod p). Bob receives c1 and computes k = cb1 (mod p).
Diffie-Hellman problem: Given g, g a , g b to compute g ab (mod p).
Note that this protocol does not provide any authentication.
A simple symmetric cryptosystem: Fix a prime p and let a, b be integers such that ab = 2p − 1 = 2(p − 1) + 1.
(For example p = 311, a = 27 and b = 23.) To encrypt a message 1 ≤ m < p, Alice computes c = ma (mod p). To
decrypt a ciphertext c, Bob computes cb (mod p).
This works, but it is not very interesting in practice. Unfortunately it cannot be turned into a public-key cryptosystem: If we publish (a, p) then anyone can compute the private key b.
Generalisation: Choose any two integers a, b such that ab ≡ 1 (mod p − 1). To do this we need to ensure that a
and b are co-prime to p − 1. (It is possible to compute a and b using the extended Euclidean algorithm.)
Public key encryption: A public key encryption scheme comprises three algorithms: KeyGen, Enc, Dec.
• KeyGen is a randomised algorithm that outputs a public key pk and a private key sk.
• Enc(pk, m) takes a public key and a message and returns a ciphertext c.
• Dec(sk, c) takes a private key and a ciphertext and returns a message m or the invalid ciphertext symbol ⊥.
We require Dec(sk, Enc(pk, m)) = m.
9
© Copyright 2026 Paperzz