GNoC

Optimal Fast Hashing
Yossi Kanizo (Technion, Israel)
Joint work with Isaac Keslassy (Technion, Israel)
and David Hay (Politecnico di Torino, Italy)
Hash Tables for Networking Devices

Hash tables and hash-based structures are
often used in high-speed devices






Heavy-hitter flow identification
Flow state keeping
Flow counter management
Virus signature scanning
IP address lookup algorithms
For hash tables, ideally, 1 memory access per
element insertion

Maximize throughput & minimize power
Hash Tables for Networking Devices


Collisions are unavoidable  wasted memory
accesses
For load≤1, let a and d be the average and worstcase time (number of memory accesses) per
element insertion


Initially empty buckets
Only insertions (no deletions)
Objective: Minimize a and d
Memory
1
2
3
4
5
6
7
8
9
Why We Care

On-chip memory:
memory accesses  power consumption

Off-chip memory:
memory accesses  lost on/off-chip pin capacity

Datacenters:
memory accesses  network & server load

Parallelism does not help reduce these costs

d serial or parallel memory accesses have same cost
Traditional Hash Table Schemes

Example 1: linked lists (chaining)
9
6
Memory
1
7
3
4
1
5
4
5
2
3
2
6
7
8
8
9
Traditional Hash Table Schemes



Example 1: linked lists (chaining)
Example 2: linear probing (open addressing)
Problem: the worst-case time cannot be
bounded by a constant d
6
4
Memory
1
2
3
1
5
3
2
4
5
6
7
8
8
9
High-Speed Hardware

Enable overflows: if time exceeds d → overflow list



Can be stored in expensive CAM
Otherwise, overflow elements = lost elements
Bucket contains h elements


E.g.: 128-bit memory word  h=4 elements of 32 bits
Assumption: Access cost (read & write word) = 1 cycle
6
h
Memory
1
7
3
4
1
5
4
5
2
3
2
6
7
8
8
9
9
CAM
Problem Formulation
Given average time a and worst-case time d,
Minimize overflow rate 
6
h
Memory
1
7
3
4
1
5
4
5
2
3
2
6
7
8
8
9
9
CAM
Example: Power of d Random Choices

d hash functions: pick least loaded bucket.


Break ties u.a.r. [Azar et al.] or to the left
[Vöcking]
Intuition: can reach low 
… but average time a = worst-case time d
 wasted memory accesses
6
h
Memory
1
7
3
4
1
5
4
5
2
3
2
6
7
8
8
9
9
CAM
Main Results

Lower bound on overflow for any scheme

Optimality of three schemes on
successively larger ranges:
SIMPLE
 GREEDY
 MHT (optimal when subtable sizes fall
geometrically)

Overflow Lower Bound

Objective: given any online scheme with average a
and worst-case d, find lower-bound on overflow .
No scheme can achieve
(capacity region)
[h=4, load=n/(mh)=0.95, fixed d]
Overflow Lower Bound

Problem: the number of hashes of each element
depends on the instantaneous memory state.

How can we bound the overflow?
13
h
12
6
7
3
4
1
14
2
10
3
11
1
5
4
5
2
6
7
8
8
9
9
CAM
Overflow Lower Bound: Proof Intuition

Assume hashes are uniform. Then relax constraints:




Offline,
No worst-case d, and
Uncolor the hashes
(n elements) x (a hashes per element) = an uncolored hashes

Lower-bound on expected number of unhashed memory bins
14
1314
12
h
1
14
14
14
6
7
14
3
4
10
1
5
4
5
2
3
11
2
6
7
14
8
8
9
9
CAM
Overflow Lower Bound

Result: closed-form lower-bound formula
 Given
 Valid

n elements in m buckets of height h:
also for non-uniform hashes
Defines a capacity region for highthroughput hashing
Lower-Bound Example
For 3% overflow rate,
throughput can be at most
1/a = 2/3 of memory rate
[h=4, load=n/(mh)=0.95]
Overflow Lower Bound

Example: d-left scheme: low overflow ,
but high average memory access rate a
[h=4, load=n/(mh)=0.95, m=5,000]
Main Results

Lower bound on overflow for any scheme

Optimality of three schemes on
successively larger ranges:
SIMPLE
 GREEDY
 MHT (optimal when subtable sizes fall
geometrically)

The SIMPLE Scheme

SIMPLE scheme: single hash function
 Looks

like truncated linked list
Intuition: The final state only depends on
the hashes, not on the successive states
 can uncolor elements
6
h
Memory
1
7
3
4
1
5
4
5
2
3
2
6
7
8
8
9
9
CAM
The SIMPLE Scheme: Proof Intution
Same reasoning as offline lower-bound
 Result: for a = 1, SIMPLE is optimal (i.e.
achieves min )

 Formal
proof relies on mean-field analysis
(differential equations with continuous-time
fluid limit)
When all elements
have been hashed:
11
6
h
Memory
1
7
3
4
1
5
4
5
2
3
10
2
6
7
8
8
9
9
CAM
Performance of SIMPLE Scheme
The lower bound can
actually be achieved
for a=1
[h=4, load=0.95, m=5,000]
The GREEDY Scheme

Using uniform hashes, try to insert each
element greedily until either inserted or d
d=2
6
h
Memory
1
7
3
4
1
5
4
5
2
3
2
6
7
8
8
9
9
CAM
The GREEDY Scheme: Proof Intuition




Un-coloring argument:
2nd try of collided element  new element with 1 hash
(GREEDY with x elements, i.e. x∙a(x) hashes) 
(SIMPLE with x∙a(x) elements)
Optimal: For any xn elements
Optimality true until no more elements can be added:
cut-off point aco ≡ a(n)
1414
14
13
h
12
6
7
3
4
1
2
10
3
11
1
5
4
5
2
6
7
8
8
9
9
CAM
Performance of GREEDY Scheme
The GREEDY scheme is
always optimal until aco
[d=4, h=4, load=0.95, m=5,000]
Performance of GREEDY Scheme
Overflow rate worse than 4-left,
but better throughput (1/a)
[d=4, h=4, load=0.95, m=5,000]
The MHT Scheme

MHT (Multi-Level Hash Table)
[Broder&Karlin]: d successive subtables
with their d hash functions
6
h
7
4
Memory
1
2
3
5
3
1st Subtable
1
9
2
8
4
5
6
7
2nd Subtable
3rd Subtable
CAM
Performance of MHT Scheme


Optimality of MHT until cut-off point aco(MHT)
 Proof that subtable sizes fall geometrically
Confirmed in simulations
Overflow rate close to 4-left, with
much better throughput (1/a)
[d=4, h=4, load=0.95, m=5,000]
Conclusion
Established “capacity region” of highspeed hashing
 Showed that three schemes are optimal
on different ranges
 MHT is optimal when subtable sizes fall
geometrically



Long-known rule-of-thumb
The MHT cut-off point is larger than the
Greedy one
Thank you.