Lecture 3 — 22 April 3.1 A Motivating Example: Fingerprinting

University of Tokyo: Advanced Algorithms
Summer 2010
Lecture 3 — 22 April
Lecturer: François Le Gall
3.1
Scribe: Florian Wagner
A Motivating Example: Fingerprinting
In this example, we take a look at a game with two players:
Alice and Bob have a binary string a ∈ {0, 1}k and b ∈ {0, 1}k . Now Bob has to decide
?
whether a = b. As a constraint, only Alice can send messages to Bob. The question is: “how
many bits are needed for this communication?”.
Goal: A protocol with as few communication as possible.
Trivial protocol: Alice sends the entire string a. This requires k bits.
Randomized protocol: We will now discuss a protocol with O(log k) bits: This protocol
is very simple, consisting of 3 steps. First, we define ā and b̄ as a conversion of the strings a
and b to integers.
ā =
k
X
ai 2i−1
i=1
b̄ =
k
X
ā, b̄ ∈ {0, 1, . . . , 2k−1 }
ai 2i−1
i=1
The steps of the algorithm are:
1. Alice takes a random prime p ∈ {2, 3, . . . , t}, whereas t is defined later on. It is not
trivial to generate random primes, but we can do it efficiently, as we will see in the
next lecture
2. Alice sends Bob two things: the prime p and the remainder of the division ( ā mod p )
| {z }
∈ {0,...,p−1}
3-1
?
3. Bob checks whether (ā mod p) = (b̄ mod p)
£
We now
analyze
the
success
probability
of
this
algorithm:
If
a
=
b
then
P
r
(ā mod p) =
¤
(b̄ mod p) = 1. If a ̸= b then
£
¤ τ (k)
P r (ā mod p) = (b̄ mod p) ≤
,
π(t)
where π(t) is the number of primes in the set {2, 3, . . . t} and τ (k) is the maximum number
of prime divisors of an element in the set {1, ..., 2k − 1}, since (ā mod p) = (b̄ mod p) if and
only if |ā − b̄| is divided by p. Next, we will now evaluate this ratio.
Fact. τ (k) ≤ log2 (2k − 1) ≤ k.
¤
Proof: Since each prime is larger than 2.
Now we bound π(k) using the following well-known theorem.
t→∞
Theorem 3.1 (Prime number theorem). π(t) ∼
t
.
ln(t)
Moreover,
t
t
≤ π(t) ≤ 1.26
ln(t)
ln(t)
for any n ≥ 17 (even though it is a weaker bound it is more convenient for practical usages).
Suppose that a ̸= b. Then:
£
¤ k ln(t)
P r (ā mod p) = (b̄ mod p) ≤
.
t
Let t = c k ln(k) (c is a constant):
¡
¢
k ln(c) + ln(k) + ln(ln(k))
k ln(t)
=
t
c k ln(k)
ln(k) is the dominant part, therefore
=
1
+ o(1).
c
Summary: If a and b are not equal then the probability that the test will fail is less than
1
+ o(1). Repeating the test or increasing the constant c will improve the probability.
c
3-2
These kinds of algorithms are called fingerprinting algorithms. A schematic description of
the technique is shown in the next figure:
The key idea is the same idea as in hashing: we hash a big set to a much smaller set,
trying to avoid collisions.
3.2
3.2.1
Universal Hashing
Hash tables, as used for data structures
First, one hash function is defined: h : M → N , whereas |N | ≪ |M |.
We want to insert data from
£ M¤ into N with as few collisions as possible (the data x ∈ M is
inserted into the entry A h(x) ).
Requirements:
- N is small, we want to map a huge amount of data into a much smaller data space
- h is easy to evaluate
∈M
∈M
z}|{ z}|{
- Only few collisions (a collision is a pair ( x , y ) with x ̸= y and h(x) = h(y))
3-3
We want to design functions with few collisions. Of course, collisions are inevitable, as there
exist bad input sets for which a lot of collisions occur. Basically we do an average-case
analysis, that means, this bad input sets can be neglected as they will not occur that often.
In this lecture we take a theoretical approach, that even works for the worst-case inputs.
Therefore we will take many random hash functions.
3.2.2
Universal Hashing
Idea: Take a random hash function from a family of functions. Then we can prove that,
for any set of inputs, there are only few collisions. We now state the formal definition.
Definition: Let M and N be two sets, with |M | ≥ |N |. Let H be a family of functions
from M to N . The family H is a universal hashing family if for any x, y ∈ M with x ̸= y
the following property holds:
£
¤
1
P rh∈H h(x) = h(y) ≤
.
|N |
Example: Let FM,N be the family of all functions from M to N .
If x ̸= y then:
£
¤
1
P rh∈FM,N h(x) = h(y) =
.
|N |
Problems:
1. This set is very large
2. These functions might be difficult to evaluate
We will now construct a family of functions that look like as if they were taken from FM,N .
Of course these sets are not the same, but locally they look like it.
Moreover, we want to construct easy functions and only a small number of them. We
will now see two illustrative examples of how to construct these families. The fingerprinting
algorithms that we just saw are a set of hashing functions, but they are not universal: the
failure probability was basically 1c . But we want |N1 | .
3-4
3.2.3
Construction of universal hashing families:
“The Matrix Method”.
This one is an illustrative example, but not so convenient in terms of the results.
Definitions:
M = {0, 1}u
N = {0, 1}v
the set of strings of length u
the set of strings of length v where v ≤ u
Let A be the set of matrices of size v × u over F2 = {0, 1}, where F2 is the boolean field (i.e.,
1 + 1 = 0). For each A ∈ A we define the function hA as follows.
hA : M → N
x 7→ Ax
©
ª
Theorem 3.2. H = hA |A ∈ A is a universal hashing family.
Proof: Take two elements x and y in M such that x ̸= y. As they are different, there exists
an i0 such that xi0 ̸= yi0 that is: (xi0 − yi0 = 1). Suppose that i0 = 1 so (xi − yi = 1). Then
£
¤
£
¤
P rA∈A hA (x) = hA (y) = P rA A(x − y) = 0
= P rA
u
hX
Aij (xj − yj ) = 0 ∀i ∈ {1, ..., v}
i
j=1
Pu
h
j=2 Aij (xj − yj )
= P rA Ai1 = −
(x1 − y1 )
Here we used the following notation:



A=

i
∀i ∈ {1, ..., v} .
A11 A12 . . . A1u
..
.
A21 . .
.
..
.
Av1 Av2 . . . Avu



.

Notice that taking a random matrix in this set A is equivalent to take a matrix with random
entries in F2 . Without loss of generality we can suppose that the entries in the last v − 1
3-5
columns are taken first. For each entry Ai1 in the first column, there is then only one
value (among
the two values in F2 ) that can produce a collision for x and y: the value
Pu
j=2 Aij (xj −yj )
Ai1 = −
. We conclude that
(x1 −y1 )
£
¤
1
1
P rA∈A hA (x) = hA (y) = v =
.
2
|N |
¤
3.2.4
Construction of universal hashing families:
a number-theoretic method
This one will be more efficient: the size of the family will be smaller and the functions easier
to evaluate. This construction is used in complexity theory.
M = {0, 1, . . . , m − 1}
N = {0, 1, . . . , n − 1} m ≥ n
Take a prime p such that m ≤ p ≤ 2m. There is always a prime between m and 2m; this
will be shown in the next lectures. Let Zp = {0, 1, . . . , p − 1} = GF (p) be the Galois field
with the operations ⊕p , ⊙p .
For any a ∈ Zp and b ∈ Zp with a ̸= 0 we define:
ha,b :
M →N
x 7→ ga,b (x) mod n
where ga,b denotes the function such that ga,b (x) = ax + b mod p.
©
ª
Theorem 3.3. H = ha,b | a ∈ Zp , b ∈ Zp , a ̸= 0 is a universal hashing family.
Proof: Let us take x, y ∈ M and a ∈ Zp , b ∈ Zp such that x ̸= y and a ̸= 0.
Then x − y ̸= 0 mod p (because |x − y| ∈ {1, . . . , m − 1} and m ≤ p). We also claim that
ga,b (x) ̸= ga,b (y). Proof:
Assume that ax + b = ay + b mod p
⇒ a(x − y) = 0 mod p
⇒ Since a ̸= 0 ⇒ (x − y) = 0 mod p
⇒ Contradiction!
3-6
Now we bound the probability of a collision.
£
¤
1
P ra∈Zp ha,b (x) = ha,b (y) ≤
|N |
a̸=0
b∈Zp
=
X
s∈Zp
t∈Zp
h
i
P ra,b (ha,b (x) = ha,b (y)) ∧(ga,b (x) = s) ∧ (ga,b (y) = t)
|
{z
}
ga,b (x)=ga,b (y) mod n
⇒s=t mod n
X
=
h
i
P r ga,b (x) = s ∧ ga,b (y) = t
s,t∈Zp
s̸=t
s=t mod n
Now two claims:
Claim 1: For any s, t ∈ Zp with s ̸= t
P ra∈Zp [ga,b (x) = s ∧ ga,b (y) = t] =
b∈Zp
a̸=0
1
.
p(p − 1)
Proof of claim 1: If we rewrite the formula, we get ax + b = s mod p and ay + b = t mod p.
This is a system of independent linear equations, and thus there is exactly one solution (a, b).
´
³l m
¯
¯
.
Claim 2: ¯{(s, t)|s, t ∈ Zp , s ̸= t, s = t mod n}¯ = p np − 1 ≤ p(p−1)
n
Proof of claim 2: We have to count the number of such pairs (s, t). The idea is that there
are:
• p choices for s;
• ⌈ np ⌉ − 1 possibilities for t once s is fixed.
From these two claims we conclude that
£
¤
P ra∈ZP ha,b (x) = ha,b (y) ≤
a̸=0
b∈ZP
1
p(p − 1)
1
·
= .
p(p − 1)
n
n
¤
3-7