Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy) Hash Tables for Networking Devices Hash tables and hash-based structures are often used in high-speed devices Heavy-hitter flow identification Flow state keeping Flow counter management Virus signature scanning IP address lookup algorithms For hash tables, ideally, 1 memory access per element insertion Maximize throughput & minimize power Hash Tables for Networking Devices Collisions are unavoidable wasted memory accesses For load≤1, let a and d be the average and worstcase time (number of memory accesses) per element insertion Initially empty buckets Only insertions (no deletions) Objective: Minimize a and d Memory 1 2 3 4 5 6 7 8 9 Why We Care On-chip memory: memory accesses power consumption Off-chip memory: memory accesses lost on/off-chip pin capacity Datacenters: memory accesses network & server load Parallelism does not help reduce these costs d serial or parallel memory accesses have same cost Traditional Hash Table Schemes Example 1: linked lists (chaining) 9 6 Memory 1 7 3 4 1 5 4 5 2 3 2 6 7 8 8 9 Traditional Hash Table Schemes Example 1: linked lists (chaining) Example 2: linear probing (open addressing) Problem: the worst-case time cannot be bounded by a constant d 6 4 Memory 1 2 3 1 5 3 2 4 5 6 7 8 8 9 High-Speed Hardware Enable overflows: if time exceeds d → overflow list Can be stored in expensive CAM Otherwise, overflow elements = lost elements Bucket contains h elements E.g.: 128-bit memory word h=4 elements of 32 bits Assumption: Access cost (read & write word) = 1 cycle 6 h Memory 1 7 3 4 1 5 4 5 2 3 2 6 7 8 8 9 9 CAM Problem Formulation Given average time a and worst-case time d, Minimize overflow rate 6 h Memory 1 7 3 4 1 5 4 5 2 3 2 6 7 8 8 9 9 CAM Example: Power of d Random Choices d hash functions: pick least loaded bucket. Break ties u.a.r. [Azar et al.] or to the left [Vöcking] Intuition: can reach low … but average time a = worst-case time d wasted memory accesses 6 h Memory 1 7 3 4 1 5 4 5 2 3 2 6 7 8 8 9 9 CAM Main Results Lower bound on overflow for any scheme Optimality of three schemes on successively larger ranges: SIMPLE GREEDY MHT (optimal when subtable sizes fall geometrically) Overflow Lower Bound Objective: given any online scheme with average a and worst-case d, find lower-bound on overflow . No scheme can achieve (capacity region) [h=4, load=n/(mh)=0.95, fixed d] Overflow Lower Bound Problem: the number of hashes of each element depends on the instantaneous memory state. How can we bound the overflow? 13 h 12 6 7 3 4 1 14 2 10 3 11 1 5 4 5 2 6 7 8 8 9 9 CAM Overflow Lower Bound: Proof Intuition Assume hashes are uniform. Then relax constraints: Offline, No worst-case d, and Uncolor the hashes (n elements) x (a hashes per element) = an uncolored hashes Lower-bound on expected number of unhashed memory bins 14 1314 12 h 1 14 14 14 6 7 14 3 4 10 1 5 4 5 2 3 11 2 6 7 14 8 8 9 9 CAM Overflow Lower Bound Result: closed-form lower-bound formula Given Valid n elements in m buckets of height h: also for non-uniform hashes Defines a capacity region for highthroughput hashing Lower-Bound Example For 3% overflow rate, throughput can be at most 1/a = 2/3 of memory rate [h=4, load=n/(mh)=0.95] Overflow Lower Bound Example: d-left scheme: low overflow , but high average memory access rate a [h=4, load=n/(mh)=0.95, m=5,000] Main Results Lower bound on overflow for any scheme Optimality of three schemes on successively larger ranges: SIMPLE GREEDY MHT (optimal when subtable sizes fall geometrically) The SIMPLE Scheme SIMPLE scheme: single hash function Looks like truncated linked list Intuition: The final state only depends on the hashes, not on the successive states can uncolor elements 6 h Memory 1 7 3 4 1 5 4 5 2 3 2 6 7 8 8 9 9 CAM The SIMPLE Scheme: Proof Intution Same reasoning as offline lower-bound Result: for a = 1, SIMPLE is optimal (i.e. achieves min ) Formal proof relies on mean-field analysis (differential equations with continuous-time fluid limit) When all elements have been hashed: 11 6 h Memory 1 7 3 4 1 5 4 5 2 3 10 2 6 7 8 8 9 9 CAM Performance of SIMPLE Scheme The lower bound can actually be achieved for a=1 [h=4, load=0.95, m=5,000] The GREEDY Scheme Using uniform hashes, try to insert each element greedily until either inserted or d d=2 6 h Memory 1 7 3 4 1 5 4 5 2 3 2 6 7 8 8 9 9 CAM The GREEDY Scheme: Proof Intuition Un-coloring argument: 2nd try of collided element new element with 1 hash (GREEDY with x elements, i.e. x∙a(x) hashes) (SIMPLE with x∙a(x) elements) Optimal: For any xn elements Optimality true until no more elements can be added: cut-off point aco ≡ a(n) 1414 14 13 h 12 6 7 3 4 1 2 10 3 11 1 5 4 5 2 6 7 8 8 9 9 CAM Performance of GREEDY Scheme The GREEDY scheme is always optimal until aco [d=4, h=4, load=0.95, m=5,000] Performance of GREEDY Scheme Overflow rate worse than 4-left, but better throughput (1/a) [d=4, h=4, load=0.95, m=5,000] The MHT Scheme MHT (Multi-Level Hash Table) [Broder&Karlin]: d successive subtables with their d hash functions 6 h 7 4 Memory 1 2 3 5 3 1st Subtable 1 9 2 8 4 5 6 7 2nd Subtable 3rd Subtable CAM Performance of MHT Scheme Optimality of MHT until cut-off point aco(MHT) Proof that subtable sizes fall geometrically Confirmed in simulations Overflow rate close to 4-left, with much better throughput (1/a) [d=4, h=4, load=0.95, m=5,000] Conclusion Established “capacity region” of highspeed hashing Showed that three schemes are optimal on different ranges MHT is optimal when subtable sizes fall geometrically Long-known rule-of-thumb The MHT cut-off point is larger than the Greedy one Thank you.
© Copyright 2026 Paperzz