Lecture 12 - CSE

Randomized Algorithms
CS648
Lecture 12
Hashing - II
1
RECAP OF LAST LECTURE
Problem Definition
โ€ข ๐‘ผ = 1,2, โ€ฆ , ๐‘š called universe
โ€ข ๐‘บ โІ ๐‘ผ and ๐‘  = |๐‘บ|
โ€ข ๐‘ โ‰ช๐‘š
Examples:
๐‘š = 1018 , ๐‘  = 103
Aim
Given a set ๐‘บ, build a data structure storing ๐‘บ s.t. we can answer in O(1) time :
โ€œDoes ๐‘– โˆˆ ๐‘บ ?โ€ for any given ๐‘– โˆˆ ๐‘ผ.
Hashing
โ€ข Hash table:
๐‘ป: an array of size ๐’.
โ€ข Hash function
๐’‰ : ๐‘ผ๏ƒ  [๐’]
Answering a Query: โ€œDoes ๐‘– โˆˆ ๐‘บ ?โ€
1. ๐‘˜๏ƒŸ๐’‰(๐‘–);
2. Search the list stored at ๐‘ป[๐‘˜].
Properties of ๐’‰ :
โ€ข ๐’‰ ๐‘– computable in O(1) time.
โ€ข Space required by ๐’‰: O(1).
How many bits
needed to encode ๐’‰ ?
๐‘ป
0
1
โ‹ฎ
โ‹ฎ
๐’โˆ’๐Ÿ
Elements of ๐‘บ
Collision
Definition: Two elements ๐‘–, ๐‘— โˆˆ ๐‘ผ are
said to collide under hash function ๐’‰ if
๐’‰ ๐‘– =๐’‰ ๐‘—
Worst case time complexity of searching
an item ๐‘– :
No. of elements in ๐‘บ colliding with ๐‘–.
๐‘ป
0
1
โ‹ฎ
โ‹ฎ
๐’โˆ’๐Ÿ
Universal Hash Family
Definition: A collection ๐‘ฏ of hash-functions is said to be universal if there exists a
constant ๐‘ such that for any ๐‘–, ๐‘— โˆˆ ๐‘ผ,
๐๐’‰โˆˆ๐‘Ÿ ๐‘ฏ ๐’‰ ๐‘– = ๐’‰ ๐‘—
โ‰ค
๐‘
๐‘›
This definition appears strange in the
beginning! But we shall soon see that there is
a very natural way to arrive at this definition.
Perfect hashing using O(๐’”๐Ÿ ) space
Let ๐‘ฏ be Universal Hash Family.
Let ๐‘ฟ : the number of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?
Question: What is ๐„[๐‘ฟ] ?
๐Ÿ
if ๐’‰ ๐‘– = ๐’‰(๐‘—)
๐‘ฟ๐‘–,๐‘— =
๐ŸŽ
otherwise
๐‘ฟ=
๐‘ฟ๐‘–,๐‘—
๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ
๐„๐‘ฟ =
๐„[๐‘ฟ๐‘–,๐‘— ]
๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ
=
๐[๐‘ฟ๐‘–,๐‘— = ๐Ÿ]
๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ
โ‰ค
๐‘–<๐‘— ๐š๐ง๐ ๐‘–,๐‘—โˆˆ๐‘บ
=
๐’„
๐’
๐’„ ๐’”(๐’” โˆ’ ๐Ÿ)
โˆ™
๐’
๐Ÿ
Perfect hashing using O(๐’”๐Ÿ ) space
Let ๐‘ฏ be Universal Hash Family.
Let ๐‘ฟ : the number of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?
๐’„
Lemma1: ๐„[๐‘ฟ] = ๐’ โˆ™
๐’”(๐’”โˆ’๐Ÿ)
๐Ÿ
1
Lemma2: For ๐’ = ๐’„๐’”๐Ÿ , there will be no collision with probability at least 2.
Algorithm1: Perfect hashing for ๐‘บ
Fix ๐’ = ๐’„๐’”๐Ÿ ;
Repeat
1. Pick ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ;
2. ๐’• ๏ƒŸ the number of collisions for ๐‘บ under ๐’‰.
Until ๐’• = ๐ŸŽ.
Build the hash table.
Theorem: A perfect hash function can be computed for ๐‘บ in expected O(๐’”๐Ÿ ) time.
HASHING WITH OPTIMAL SPACE AND
WORST CASE O(1) SEARCH TIME
Optimal space hashing with
worst case O(1) search time
๐‘ฏ be Universal Hash Family.
๐‘ฟ : no. of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?
๐’„
Lemma1: ๐„[๐‘ฟ] = ๐’ โˆ™
๐’”(๐’”โˆ’๐Ÿ)
.
๐Ÿ
Question: What is ๐„[๐‘ฟ] when ๐’ = ๐’” ?
Answer: ๐‘
(๐’”โˆ’๐Ÿ)
.
๐Ÿ
Optimal space hashing with
worst case O(1) search time
๐‘ฏ be Universal Hash Family.
๐‘ฟ : no. of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?
Lemma1: ๐„[๐‘ฟ] =
(๐’”โˆ’๐Ÿ)
๐Ÿ
๐‘ป
when ๐‘› = ๐‘๐‘ .
Algorithm:
Fix ๐‘› = ๐‘๐‘ ;
Repeat
1. Pick ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ;
2. ๐’• ๏ƒŸ no. of collisions for ๐‘บ under ๐’‰;
Until ๐’• โ‰ค ๐’”;
Build the hash table; //primary hash table
For each 0 โ‰ค ๐‘– < ๐‘›
If size of list ๐‘ป[๐‘–] > 1
1. Build a perfect hash table for list ๐‘ป[๐‘–];
2. Make ๐‘ป[๐‘–] point to this hash table;
0
1
๐’โˆ’๐Ÿ
Optimal space hashing with
worst case O(1) search time
๐‘ฏ be Universal Hash Family.
๐‘ฟ : no. of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?
Lemma1: ๐„[๐‘ฟ] =
(๐’”โˆ’๐Ÿ)
๐Ÿ
๐‘ป
when ๐‘› = ๐‘๐‘ .
Algorithm:
Fix ๐‘› = ๐‘๐‘ ;
Repeat
1. Pick ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ;
2. ๐’• ๏ƒŸ no. of collisions for ๐‘บ under ๐’‰;
Until ๐’• โ‰ค ๐’”;
Build the hash table; //primary hash table
For each 0 โ‰ค ๐‘– < ๐‘›
If size of list ๐‘ป[๐‘–] > 1
1. Build a perfect hash table for list ๐‘ป[๐‘–];
2. Make ๐‘ป[๐‘–] point to this hash table;
0
1
๐’โˆ’๐Ÿ
Optimal space hashing with
worst case O(1) search time
๐‘ฏ be Universal Hash Family.
๐‘ฟ : no. of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?
Lemma1: ๐„[๐‘ฟ] =
(๐’”โˆ’๐Ÿ)
๐Ÿ
๐‘ป
when ๐‘› = ๐‘๐‘ .
Algorithm:
Fix ๐‘› = ๐‘๐‘ ;
Repeat
1. Pick ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ;
2. ๐’• ๏ƒŸ no. of collisions for ๐‘บ under ๐’‰;
Until ๐’• โ‰ค ๐’”;
Build the hash table; //primary hash table
For each 0 โ‰ค ๐‘– < ๐‘›
If size of list ๐‘ป[๐‘–] > 1
1. Build a perfect hash table for list ๐‘ป[๐‘–];
2. Make ๐‘ป[๐‘–] point to this hash table;
0
1
๐’โˆ’๐Ÿ
Optimal space hashing with
worst case O(1) search time
๐‘ฏ be Universal Hash Family.
๐‘ฟ : no. of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?
Lemma1: ๐„[๐‘ฟ] =
(๐’”โˆ’๐Ÿ)
๐Ÿ
๐‘ป
when ๐‘› = ๐‘๐‘ .
Algorithm:
Fix ๐‘› = ๐‘๐‘ ;
Repeat
1. Pick ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ;
2. ๐’• ๏ƒŸ no. of collisions for ๐‘บ under ๐’‰;
Until ๐’• โ‰ค ๐’”;
Build the hash table; //primary hash table
For each 0 โ‰ค ๐‘– < ๐‘›
If size of list ๐‘ป[๐‘–] > 1
1. Build a perfect hash table for list ๐‘ป[๐‘–];
2. Make ๐‘ป[๐‘–] point to this hash table;
0
1
๐’โˆ’๐Ÿ
๐‘ฏ be Universal Hash Family.
๐‘ฟ : no. of collisions for ๐‘บ when ๐’‰ โˆˆ๐‘Ÿ ๐‘ฏ ?
Lemma1: ๐„[๐‘ฟ] =
0
1
2
(๐’”โˆ’๐Ÿ)
๐Ÿ
when ๐‘› = ๐‘๐‘ .
๐‘ป
๐‘ป
0
1
2
.
.
.
.
.
.
๐‘›โˆ’1
๐‘›โˆ’1
Let ๐‘๐‘– : number of elements in ๐‘ป[๐‘–]
Extra Space required:
๐‘–<๐‘› ๐š๐ง๐๐‘๐‘– >1 ๐‘๐‘–
๏ƒจ
๐‘๐‘– ๐‘๐‘– โˆ’1
๐‘–<๐‘› ๐š๐ง๐๐‘๐‘– >1
2
2
๐‘–<๐‘› ๐š๐ง๐๐‘๐‘– >1 ๐‘๐‘– = 2๐‘ฟ
๏ƒจ
2
๐‘–<๐‘›๐š๐ง๐๐‘๐‘–>1 ๐‘๐‘– < ๐Ÿ๐’”
๐‘ฟ=
+
2
๐‘–<๐‘› ๐š๐ง๐๐‘๐‘– >1 ๐‘๐‘–
๏Š
Is there any
relation between
๐‘ฟ and ๐‘๐‘– โ€™s?
Theorem:
A given set ๐‘บ can be preprocessed in expected O(๐’”) time to build a data
structure (2-level hash table) of O(๐’”) size such that any search query can be
answer in worst case O(1) time.
WHY SUCH A DEFINITION FOR
UNIVERSAL HASH FAMILY ?
Why does hashing work so well in Practice ?
A simple hash function: ๐’‰ ๐‘– = ๐‘– ๐ฆ๐จ๐ ๐‘›.
โ€ข ๐’‰ works so well in practice because the set ๐‘บ is usually a uniformly random
subset of ๐‘ผ.
As a result
๐‘
๐๐‘–,๐‘—โˆˆ๐‘Ÿ ๐‘ผ ๐’‰ ๐‘– = ๐’‰ ๐‘— โ‰ค
๐‘›
โ€ข It is easy to fool this hash function such that it achieves O(s) search time.
This makes us think:
โ€œCan we achieve expected O(1) search time for any given set ๐‘บ.โ€
similar question while
Quick Sort ๏ƒจ Randomized Quick Sort
Universal Hash Family
A simple hash function: ๐’‰ ๐‘– = ๐‘– ๐ฆ๐จ๐ ๐‘›.
๐๐‘–,๐‘—โˆˆ๐‘Ÿ ๐‘ผ ๐’‰ ๐‘– = ๐’‰ ๐‘—
โ‰ค
๐‘
๐‘›
Definition: A collection ๐‘ฏ of hash-functions is said to be universal if there
exists a constant ๐‘ such that for any ๐‘–, ๐‘— โˆˆ ๐‘ผ,
๐๐’‰โˆˆ๐‘Ÿ ๐‘ฏ ๐’‰ ๐‘– = ๐’‰ ๐‘—
โ‰ค
๐‘
๐‘›
A SIMPLE AND COMPACT
UNIVERSAL HASH FAMILY
The starting point
The simple hash function: ๐’‰ ๐‘– = ๐‘– ๐ฆ๐จ๐ ๐‘›.
Problem: Two elements in ๐‘–, ๐‘— โˆˆ ๐‘บ are bound to collide if ๐‘› divides |๐‘— โˆ’ ๐‘–| .
Is there some operation
which when applied over any ๐‘บ distributes |๐‘— โˆ’ ๐‘–|
randomly uniformly over [0,1,โ€ฆ, ๐‘› โˆ’ 1] ?
mod operation
๐‘– : a non-negative integer
๐‘ก : a positive integer
๐‘– mod ๐‘ก โˆˆ {0,1,โ€ฆ,๐‘ก โˆ’ 1}.
Question: How is |๐‘– mod ๐‘ก โˆ’ ๐‘— mod ๐‘ก| related to |๐‘– โˆ’ ๐‘—|mod ๐‘ก ?
Consider some Examples:
โ€ข | 55 mod 31 โˆ’ 43 mod 31 | = ??12 and | 55 โˆ’ 43| mod 31 = ??12
โ€ข
| 91 mod 31 โˆ’ 102 mod 31 | = ??
20
and | 91 โˆ’ 102| mod 31 = ??
11
Answer: Let ๐‘˜= |๐‘– โˆ’ ๐‘—| mod ๐‘ก . Then |๐‘– mod ๐‘ก โˆ’ ๐‘— mod ๐‘ก| = ??โˆˆ {๐‘˜, ๐‘ก โˆ’ ๐‘˜}
mod operation
๐‘ : a prime number
๐ด : {1,2, โ€ฆ , ๐‘ โˆ’ 1}
Consider any ๐‘– โˆˆ ๐ด.
Question: What can we say about set ๐ด๐‘– = { ๐‘–๐‘— mod ๐‘ | ๐‘— โˆˆ ๐ด} ?
Example: ๐‘ = 7, ๐‘– = 2.
๐‘—๏ƒ 
1
2
3
4
5
6
3๐‘— mod 7๏ƒ 
3
6
2
5
1
4
mod operation
๐‘ : a prime number
๐ด : {1,2, โ€ฆ , ๐‘ โˆ’ 1}
Consider any ๐‘– โˆˆ ๐ด.
Question: What can we say about set ๐ด๐‘– = { ๐‘–๐‘— ๐ฆ๐จ๐ ๐‘ | ๐‘— โˆˆ ๐ด} ?
Example: ๐‘ = 7, ๐‘– = 3.
๐‘—๏ƒ 
1
2
3
4
5
6
3๐‘— mod 7๏ƒ 
3
4
6
1
2
5
5
2
1
6
4
3
4๐‘— mod 7๏ƒ 
Fact: ๐ด๐‘– = ๐ด for all ๐‘– โˆˆ ๐ด.
Proof: ๐‘–๐‘— ๐ฆ๐จ๐ ๐‘ = ๐‘–๐‘˜ ๐ฆ๐จ๐ ๐‘
๏ƒณ ๐‘ divides (๐‘–๐‘— โˆ’ ๐‘–๐‘˜)
๏ƒณ ๐‘ divides ๐‘–(๐‘— โˆ’ ๐‘˜)
๏ƒณ ๐‘ divides ๐‘– or ๐‘ divides (๐‘— โˆ’ ๐‘˜)
Not possible
mod operation
๐‘ : a prime number
๐ด : {1,2, โ€ฆ , ๐‘ โˆ’ 1}
Consider any ๐‘– โˆˆ ๐ด.
Define set ๐ด๐‘– = { ๐‘–๐‘— ๐ฆ๐จ๐ ๐‘ | ๐‘— โˆˆ ๐ด} ?
Fact: ๐ด๐‘– = ๐ด for all ๐‘– โˆˆ ๐ด.
Question: If ๐‘ฅ โˆˆ๐‘Ÿ ๐ด, then what can we say about (๐‘ฅ๐‘– ๐ฆ๐จ๐ ๐‘) ?
Answer: distributed randomly uniformly over ๐ด.
Can you now see, that the above answer plays the key role in formulating the hash
function ๐’‰๐‘ฅ ๐‘– = (๐‘–๐‘ฅ ๐ฆ๐จ๐ ๐‘) ๐ฆ๐จ๐ ๐‘› ?
๐’‰๐‘ฅ ๐‘– = (๐‘–๐‘ฅ ๐ฆ๐จ๐ ๐‘) ๐ฆ๐จ๐ ๐‘›
1
2
๐‘–๐‘ฅ ๐ฆ๐จ๐ ๐‘
.
.
.
๐‘–
Good fact:
An element ๐‘– is mapped to a random
element in {0, โ€ฆ , ๐‘› โˆ’ 1}.
Slightly bad fact :
Once element ๐‘– is mapped to a location,
the mapping of ๐‘– + ฮ” is no more random.
๐‘–+ฮ”
๐‘š
=๐‘โˆ’1
So it is not clear whether
|๐’‰๐‘ฅ ๐‘– + ฮ” - ๐’‰๐‘ฅ ๐‘– | is mapped uniformly
randomly over {0,โ€ฆ, ๐‘› โˆ’ 1}.
โ€ฆSo let us see ๐’‰๐‘ฅ () a bit more closelyโ€ฆ
Probability of collision between
๐‘– and ๐‘– + ฮ”
Let ๐’‰๐‘ฅ ๐‘— = (๐‘—๐‘ฅ ๐ฆ๐จ๐ ๐‘) ๐ฆ๐จ๐ ๐‘›
๐‘– and ๐‘– + ฮ” will collide under ๐’‰๐‘ฅ if
|๐‘–๐‘ฅmod ๐‘ โˆ’ ๐‘– + ฮ” ๐‘ฅmod ๐‘| is divisible by ๐‘›.
Question: What is relation between |๐‘–๐‘ฅmod ๐‘ โˆ’ ๐‘– + ฮ” ๐‘ฅmod ๐‘| and ฮ”๐‘ฅmod ๐‘ ?
Answer: |๐‘–๐‘ฅmod ๐‘ โˆ’ ๐‘– + ฮ” ๐‘ฅmod ๐‘| is either ฮ”๐‘ฅmod ๐‘ or ๐‘ โˆ’ ฮ”๐‘ฅmod ๐‘.
Probability of collision between
๐‘– and ๐‘– + ฮ”
Let ๐’‰๐‘ฅ ๐‘— = (๐‘—๐‘ฅ ๐ฆ๐จ๐ ๐‘) ๐ฆ๐จ๐ ๐‘›
Lemma: If ๐‘– and ๐‘– + ฮ” collide under ๐’‰๐‘ฅ , then
either ฮ”๐‘ฅmod ๐‘ is divisible by ๐‘› or ๐‘ โˆ’ ฮ”๐‘ฅmod ๐‘ is divisible by ๐‘›.
{1,โ€ฆ, ๐‘ โˆ’ 1}
{ฮ”๐‘ฅmod ๐‘ | 1 โ‰ค ๐‘ฅ โ‰ค ๐‘ โˆ’ 1} = ??
Students must
realize that it is a
Let ๐‘ฅ โˆˆ๐‘Ÿ {1,โ€ฆ, ๐‘ โˆ’ 1}.
necessary condition
Probability of collision between ๐‘– and ๐‘– + ฮ” =
and not sufficient
โ‰ค P(ฮ”๐‘ฅmod ๐‘ is divisible of ๐‘›
or ๐‘ โˆ’ ฮ”๐‘ฅmod ๐‘ is divisible by ๐‘›)
condition for
= 2 P(ฮ”๐‘ฅmod ๐‘ is divisible of ๐‘›)
collision. To get an
๐‘โˆ’1
idea, study the
๐‘›
=2
example given at the
๐‘โˆ’1
last slide of this
2
โ‰ค๐‘›
lecture.
Theorem:
Let ๐’‰๐‘ฅ ๐‘— = (๐‘—๐‘ฅ ๐ฆ๐จ๐ ๐‘) ๐ฆ๐จ๐ ๐‘›, then H={๐’‰๐’™ | 1 โ‰ค ๐‘ฅ โ‰ค ๐‘ โˆ’ 1} is universal.
Example
๐‘ = 7, ๐‘› = 4.
๐’‰๐‘ฅ ๐‘— = (๐‘—๐‘ฅ ๐ฆ๐จ๐ 7) ๐ฆ๐จ๐ 4.
Observe that
๐‘โˆ’1
๐‘›
๐‘ฅ
๐‘—
=1
Question: How many collisions between
2 and 3 ?
Answer: two (for ๐‘ฅ=3,4).
Here ฮ”๐‘ฅ ๐ฆ๐จ๐ 7 = 4 for ๐‘ฅ=4.
And 7 โˆ’ ฮ”๐‘ฅ ๐ฆ๐จ๐ 7 = 4 for ๐‘ฅ=3
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
2
4
6
1
3
5
3
6
2
5
1
4
4
1
5
2
6
3
5
3
1
6
4
2
6
Question: How many collisions between 2 and 4 ?
Answer: No collisions!
(although ฮ”๐‘ฅ ๐ฆ๐จ๐ 7 = 4 for ๐‘ฅ = 2 here.)
5
4
3
2
1
Table storing ๐‘—๐‘ฅ ๐ฆ๐จ๐ 7
Homework:
Let ๐’‰๐‘ฅ,๐‘ฆ ๐‘— = (๐‘—๐‘ฅ ๐ฆ๐จ๐ ๐‘ + ๐‘ฆ) ๐ฆ๐จ๐ ๐‘›,
Then prove that H={๐’‰๐’™,๐’š | 1 โ‰ค ๐‘ฅ, ๐‘ฆ โ‰ค ๐‘ โˆ’ 1} is universal.
In particular, show that for any ๐‘–, ๐‘— โˆˆ ๐‘ผ,
๐๐’‰โˆˆ๐‘Ÿ ๐‘ฏ ๐’‰ ๐‘– = ๐’‰ ๐‘—
1
=
๐‘›
Hence it is slightly better than the hash family discussed just now.