Randomized Algorithms
CS648
Lecture 12
Hashing - II
1
RECAP OF LAST LECTURE
Problem Definition
โข ๐ผ = 1,2, โฆ , ๐ called universe
โข ๐บ โ ๐ผ and ๐ = |๐บ|
โข ๐ โช๐
Examples:
๐ = 1018 , ๐ = 103
Aim
Given a set ๐บ, build a data structure storing ๐บ s.t. we can answer in O(1) time :
โDoes ๐ โ ๐บ ?โ for any given ๐ โ ๐ผ.
Hashing
โข Hash table:
๐ป: an array of size ๐.
โข Hash function
๐ : ๐ผ๏ [๐]
Answering a Query: โDoes ๐ โ ๐บ ?โ
1. ๐๏๐(๐);
2. Search the list stored at ๐ป[๐].
Properties of ๐ :
โข ๐ ๐ computable in O(1) time.
โข Space required by ๐: O(1).
How many bits
needed to encode ๐ ?
๐ป
0
1
โฎ
โฎ
๐โ๐
Elements of ๐บ
Collision
Definition: Two elements ๐, ๐ โ ๐ผ are
said to collide under hash function ๐ if
๐ ๐ =๐ ๐
Worst case time complexity of searching
an item ๐ :
No. of elements in ๐บ colliding with ๐.
๐ป
0
1
โฎ
โฎ
๐โ๐
Universal Hash Family
Definition: A collection ๐ฏ of hash-functions is said to be universal if there exists a
constant ๐ such that for any ๐, ๐ โ ๐ผ,
๐๐โ๐ ๐ฏ ๐ ๐ = ๐ ๐
โค
๐
๐
This definition appears strange in the
beginning! But we shall soon see that there is
a very natural way to arrive at this definition.
Perfect hashing using O(๐๐ ) space
Let ๐ฏ be Universal Hash Family.
Let ๐ฟ : the number of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
Question: What is ๐[๐ฟ] ?
๐
if ๐ ๐ = ๐(๐)
๐ฟ๐,๐ =
๐
otherwise
๐ฟ=
๐ฟ๐,๐
๐<๐ ๐๐ง๐ ๐,๐โ๐บ
๐๐ฟ =
๐[๐ฟ๐,๐ ]
๐<๐ ๐๐ง๐ ๐,๐โ๐บ
=
๐[๐ฟ๐,๐ = ๐]
๐<๐ ๐๐ง๐ ๐,๐โ๐บ
โค
๐<๐ ๐๐ง๐ ๐,๐โ๐บ
=
๐
๐
๐ ๐(๐ โ ๐)
โ
๐
๐
Perfect hashing using O(๐๐ ) space
Let ๐ฏ be Universal Hash Family.
Let ๐ฟ : the number of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
๐
Lemma1: ๐[๐ฟ] = ๐ โ
๐(๐โ๐)
๐
1
Lemma2: For ๐ = ๐๐๐ , there will be no collision with probability at least 2.
Algorithm1: Perfect hashing for ๐บ
Fix ๐ = ๐๐๐ ;
Repeat
1. Pick ๐ โ๐ ๐ฏ ;
2. ๐ ๏ the number of collisions for ๐บ under ๐.
Until ๐ = ๐.
Build the hash table.
Theorem: A perfect hash function can be computed for ๐บ in expected O(๐๐ ) time.
HASHING WITH OPTIMAL SPACE AND
WORST CASE O(1) SEARCH TIME
Optimal space hashing with
worst case O(1) search time
๐ฏ be Universal Hash Family.
๐ฟ : no. of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
๐
Lemma1: ๐[๐ฟ] = ๐ โ
๐(๐โ๐)
.
๐
Question: What is ๐[๐ฟ] when ๐ = ๐ ?
Answer: ๐
(๐โ๐)
.
๐
Optimal space hashing with
worst case O(1) search time
๐ฏ be Universal Hash Family.
๐ฟ : no. of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
Lemma1: ๐[๐ฟ] =
(๐โ๐)
๐
๐ป
when ๐ = ๐๐ .
Algorithm:
Fix ๐ = ๐๐ ;
Repeat
1. Pick ๐ โ๐ ๐ฏ ;
2. ๐ ๏ no. of collisions for ๐บ under ๐;
Until ๐ โค ๐;
Build the hash table; //primary hash table
For each 0 โค ๐ < ๐
If size of list ๐ป[๐] > 1
1. Build a perfect hash table for list ๐ป[๐];
2. Make ๐ป[๐] point to this hash table;
0
1
๐โ๐
Optimal space hashing with
worst case O(1) search time
๐ฏ be Universal Hash Family.
๐ฟ : no. of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
Lemma1: ๐[๐ฟ] =
(๐โ๐)
๐
๐ป
when ๐ = ๐๐ .
Algorithm:
Fix ๐ = ๐๐ ;
Repeat
1. Pick ๐ โ๐ ๐ฏ ;
2. ๐ ๏ no. of collisions for ๐บ under ๐;
Until ๐ โค ๐;
Build the hash table; //primary hash table
For each 0 โค ๐ < ๐
If size of list ๐ป[๐] > 1
1. Build a perfect hash table for list ๐ป[๐];
2. Make ๐ป[๐] point to this hash table;
0
1
๐โ๐
Optimal space hashing with
worst case O(1) search time
๐ฏ be Universal Hash Family.
๐ฟ : no. of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
Lemma1: ๐[๐ฟ] =
(๐โ๐)
๐
๐ป
when ๐ = ๐๐ .
Algorithm:
Fix ๐ = ๐๐ ;
Repeat
1. Pick ๐ โ๐ ๐ฏ ;
2. ๐ ๏ no. of collisions for ๐บ under ๐;
Until ๐ โค ๐;
Build the hash table; //primary hash table
For each 0 โค ๐ < ๐
If size of list ๐ป[๐] > 1
1. Build a perfect hash table for list ๐ป[๐];
2. Make ๐ป[๐] point to this hash table;
0
1
๐โ๐
Optimal space hashing with
worst case O(1) search time
๐ฏ be Universal Hash Family.
๐ฟ : no. of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
Lemma1: ๐[๐ฟ] =
(๐โ๐)
๐
๐ป
when ๐ = ๐๐ .
Algorithm:
Fix ๐ = ๐๐ ;
Repeat
1. Pick ๐ โ๐ ๐ฏ ;
2. ๐ ๏ no. of collisions for ๐บ under ๐;
Until ๐ โค ๐;
Build the hash table; //primary hash table
For each 0 โค ๐ < ๐
If size of list ๐ป[๐] > 1
1. Build a perfect hash table for list ๐ป[๐];
2. Make ๐ป[๐] point to this hash table;
0
1
๐โ๐
๐ฏ be Universal Hash Family.
๐ฟ : no. of collisions for ๐บ when ๐ โ๐ ๐ฏ ?
Lemma1: ๐[๐ฟ] =
0
1
2
(๐โ๐)
๐
when ๐ = ๐๐ .
๐ป
๐ป
0
1
2
.
.
.
.
.
.
๐โ1
๐โ1
Let ๐๐ : number of elements in ๐ป[๐]
Extra Space required:
๐<๐ ๐๐ง๐๐๐ >1 ๐๐
๏จ
๐๐ ๐๐ โ1
๐<๐ ๐๐ง๐๐๐ >1
2
2
๐<๐ ๐๐ง๐๐๐ >1 ๐๐ = 2๐ฟ
๏จ
2
๐<๐๐๐ง๐๐๐>1 ๐๐ < ๐๐
๐ฟ=
+
2
๐<๐ ๐๐ง๐๐๐ >1 ๐๐
๏
Is there any
relation between
๐ฟ and ๐๐ โs?
Theorem:
A given set ๐บ can be preprocessed in expected O(๐) time to build a data
structure (2-level hash table) of O(๐) size such that any search query can be
answer in worst case O(1) time.
WHY SUCH A DEFINITION FOR
UNIVERSAL HASH FAMILY ?
Why does hashing work so well in Practice ?
A simple hash function: ๐ ๐ = ๐ ๐ฆ๐จ๐ ๐.
โข ๐ works so well in practice because the set ๐บ is usually a uniformly random
subset of ๐ผ.
As a result
๐
๐๐,๐โ๐ ๐ผ ๐ ๐ = ๐ ๐ โค
๐
โข It is easy to fool this hash function such that it achieves O(s) search time.
This makes us think:
โCan we achieve expected O(1) search time for any given set ๐บ.โ
similar question while
Quick Sort ๏จ Randomized Quick Sort
Universal Hash Family
A simple hash function: ๐ ๐ = ๐ ๐ฆ๐จ๐ ๐.
๐๐,๐โ๐ ๐ผ ๐ ๐ = ๐ ๐
โค
๐
๐
Definition: A collection ๐ฏ of hash-functions is said to be universal if there
exists a constant ๐ such that for any ๐, ๐ โ ๐ผ,
๐๐โ๐ ๐ฏ ๐ ๐ = ๐ ๐
โค
๐
๐
A SIMPLE AND COMPACT
UNIVERSAL HASH FAMILY
The starting point
The simple hash function: ๐ ๐ = ๐ ๐ฆ๐จ๐ ๐.
Problem: Two elements in ๐, ๐ โ ๐บ are bound to collide if ๐ divides |๐ โ ๐| .
Is there some operation
which when applied over any ๐บ distributes |๐ โ ๐|
randomly uniformly over [0,1,โฆ, ๐ โ 1] ?
mod operation
๐ : a non-negative integer
๐ก : a positive integer
๐ mod ๐ก โ {0,1,โฆ,๐ก โ 1}.
Question: How is |๐ mod ๐ก โ ๐ mod ๐ก| related to |๐ โ ๐|mod ๐ก ?
Consider some Examples:
โข | 55 mod 31 โ 43 mod 31 | = ??12 and | 55 โ 43| mod 31 = ??12
โข
| 91 mod 31 โ 102 mod 31 | = ??
20
and | 91 โ 102| mod 31 = ??
11
Answer: Let ๐= |๐ โ ๐| mod ๐ก . Then |๐ mod ๐ก โ ๐ mod ๐ก| = ??โ {๐, ๐ก โ ๐}
mod operation
๐ : a prime number
๐ด : {1,2, โฆ , ๐ โ 1}
Consider any ๐ โ ๐ด.
Question: What can we say about set ๐ด๐ = { ๐๐ mod ๐ | ๐ โ ๐ด} ?
Example: ๐ = 7, ๐ = 2.
๐๏
1
2
3
4
5
6
3๐ mod 7๏
3
6
2
5
1
4
mod operation
๐ : a prime number
๐ด : {1,2, โฆ , ๐ โ 1}
Consider any ๐ โ ๐ด.
Question: What can we say about set ๐ด๐ = { ๐๐ ๐ฆ๐จ๐ ๐ | ๐ โ ๐ด} ?
Example: ๐ = 7, ๐ = 3.
๐๏
1
2
3
4
5
6
3๐ mod 7๏
3
4
6
1
2
5
5
2
1
6
4
3
4๐ mod 7๏
Fact: ๐ด๐ = ๐ด for all ๐ โ ๐ด.
Proof: ๐๐ ๐ฆ๐จ๐ ๐ = ๐๐ ๐ฆ๐จ๐ ๐
๏ณ ๐ divides (๐๐ โ ๐๐)
๏ณ ๐ divides ๐(๐ โ ๐)
๏ณ ๐ divides ๐ or ๐ divides (๐ โ ๐)
Not possible
mod operation
๐ : a prime number
๐ด : {1,2, โฆ , ๐ โ 1}
Consider any ๐ โ ๐ด.
Define set ๐ด๐ = { ๐๐ ๐ฆ๐จ๐ ๐ | ๐ โ ๐ด} ?
Fact: ๐ด๐ = ๐ด for all ๐ โ ๐ด.
Question: If ๐ฅ โ๐ ๐ด, then what can we say about (๐ฅ๐ ๐ฆ๐จ๐ ๐) ?
Answer: distributed randomly uniformly over ๐ด.
Can you now see, that the above answer plays the key role in formulating the hash
function ๐๐ฅ ๐ = (๐๐ฅ ๐ฆ๐จ๐ ๐) ๐ฆ๐จ๐ ๐ ?
๐๐ฅ ๐ = (๐๐ฅ ๐ฆ๐จ๐ ๐) ๐ฆ๐จ๐ ๐
1
2
๐๐ฅ ๐ฆ๐จ๐ ๐
.
.
.
๐
Good fact:
An element ๐ is mapped to a random
element in {0, โฆ , ๐ โ 1}.
Slightly bad fact :
Once element ๐ is mapped to a location,
the mapping of ๐ + ฮ is no more random.
๐+ฮ
๐
=๐โ1
So it is not clear whether
|๐๐ฅ ๐ + ฮ - ๐๐ฅ ๐ | is mapped uniformly
randomly over {0,โฆ, ๐ โ 1}.
โฆSo let us see ๐๐ฅ () a bit more closelyโฆ
Probability of collision between
๐ and ๐ + ฮ
Let ๐๐ฅ ๐ = (๐๐ฅ ๐ฆ๐จ๐ ๐) ๐ฆ๐จ๐ ๐
๐ and ๐ + ฮ will collide under ๐๐ฅ if
|๐๐ฅmod ๐ โ ๐ + ฮ ๐ฅmod ๐| is divisible by ๐.
Question: What is relation between |๐๐ฅmod ๐ โ ๐ + ฮ ๐ฅmod ๐| and ฮ๐ฅmod ๐ ?
Answer: |๐๐ฅmod ๐ โ ๐ + ฮ ๐ฅmod ๐| is either ฮ๐ฅmod ๐ or ๐ โ ฮ๐ฅmod ๐.
Probability of collision between
๐ and ๐ + ฮ
Let ๐๐ฅ ๐ = (๐๐ฅ ๐ฆ๐จ๐ ๐) ๐ฆ๐จ๐ ๐
Lemma: If ๐ and ๐ + ฮ collide under ๐๐ฅ , then
either ฮ๐ฅmod ๐ is divisible by ๐ or ๐ โ ฮ๐ฅmod ๐ is divisible by ๐.
{1,โฆ, ๐ โ 1}
{ฮ๐ฅmod ๐ | 1 โค ๐ฅ โค ๐ โ 1} = ??
Students must
realize that it is a
Let ๐ฅ โ๐ {1,โฆ, ๐ โ 1}.
necessary condition
Probability of collision between ๐ and ๐ + ฮ =
and not sufficient
โค P(ฮ๐ฅmod ๐ is divisible of ๐
or ๐ โ ฮ๐ฅmod ๐ is divisible by ๐)
condition for
= 2 P(ฮ๐ฅmod ๐ is divisible of ๐)
collision. To get an
๐โ1
idea, study the
๐
=2
example given at the
๐โ1
last slide of this
2
โค๐
lecture.
Theorem:
Let ๐๐ฅ ๐ = (๐๐ฅ ๐ฆ๐จ๐ ๐) ๐ฆ๐จ๐ ๐, then H={๐๐ | 1 โค ๐ฅ โค ๐ โ 1} is universal.
Example
๐ = 7, ๐ = 4.
๐๐ฅ ๐ = (๐๐ฅ ๐ฆ๐จ๐ 7) ๐ฆ๐จ๐ 4.
Observe that
๐โ1
๐
๐ฅ
๐
=1
Question: How many collisions between
2 and 3 ?
Answer: two (for ๐ฅ=3,4).
Here ฮ๐ฅ ๐ฆ๐จ๐ 7 = 4 for ๐ฅ=4.
And 7 โ ฮ๐ฅ ๐ฆ๐จ๐ 7 = 4 for ๐ฅ=3
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
2
4
6
1
3
5
3
6
2
5
1
4
4
1
5
2
6
3
5
3
1
6
4
2
6
Question: How many collisions between 2 and 4 ?
Answer: No collisions!
(although ฮ๐ฅ ๐ฆ๐จ๐ 7 = 4 for ๐ฅ = 2 here.)
5
4
3
2
1
Table storing ๐๐ฅ ๐ฆ๐จ๐ 7
Homework:
Let ๐๐ฅ,๐ฆ ๐ = (๐๐ฅ ๐ฆ๐จ๐ ๐ + ๐ฆ) ๐ฆ๐จ๐ ๐,
Then prove that H={๐๐,๐ | 1 โค ๐ฅ, ๐ฆ โค ๐ โ 1} is universal.
In particular, show that for any ๐, ๐ โ ๐ผ,
๐๐โ๐ ๐ฏ ๐ ๐ = ๐ ๐
1
=
๐
Hence it is slightly better than the hash family discussed just now.
© Copyright 2026 Paperzz