Problem 5.1. Open Hashing. Problem 5.2. Cuckoo hashing.

Übungen zur Vorlesung Algorithmen und Datenstrukturen (D-MATH/CSE) FS 2017
A. Pilz, F. Friedrich
http://lec.inf.ethz.ch/DA/2017
Problem set # 5
23.3.2017 – 30.3.2017
Please provide your solutions to the following tasks using the ETH Codeboard submission system.
Problem 5.1. Open Hashing.
We consider hash functions h(k) for keys k for a hash table of size p, where p is a prime number.
a) Which of the following functions are useful hash functions? (Explain.)
• h(k) = digit sum of k
• h(k) = k(1 + p3 ) mod p
• h(k) = bp(rk − brkc)c, r ∈ R+ \ Q
b) Insert the keys 17, 6, 5, 8, 11, 28, 14, 15 in this order into an initially empty hash table of size 11.
Use open addressing with the hash function h(k) = k mod 11 and resolve the conflicts using
(i) linear probing
(ii) quadratic probing, and
(iii) double hashing with h0 (k) = 1 + (k mod 9)
in the following form.
17: h(17) = 6
__, __, __, __, __, __, 17, __, __, __, __
6: h(6) = 6
__, ...
c) Which problem occurs if the key 17 is removed from the hash tables in b), and how can you resolve
it? Which problems occur if many keys are removed from a hash table?
d) In this task we use the hash function h(k) = k mod p and resolve collisions using double hashing.
Let q be the largest prime smaller than p, h0 (k) the second hash function, and s(j, k) the probing
function. The complete hash function in the jth step is h(k) − j · h0 (k) mod p if h0 (k) is given. and
h(k) − s(j, k) mod p if s(j, k) is given. Decide which of the following choices of h0 (k) and s(j, k)
are reasonable (and which are not), and justify your answer.
• h0 (k) = dln(k + 1)e mod q
• s(j, k) = k j mod p
• s(j, k) = ((k · j) mod q) + 1
e) Which advantage does double hashing have when you compare it to quadratic probing?
Submission link: https://codeboard.ethz.ch/daex05t01
Problem 5.2. Cuckoo hashing.
Cuckoo hashing is a hashing technique that guarantees constant time in worst case for both query and
delete operations. The idea is to use two tables T1 and T2 of same size and two hash functions h1 and
h2 . A new key x is inserted at position h1 (x) in T1 . In case of a collision, the previously stored key y
is displaced to position h2 (y) in T2 . If this leads to another collision, the next key (currently stored in
T2 at position h2 (y)) is again inserted at its appropriate position in T1 , and so on.
In some cases, this procedure continues forever, i.e., the same configuration appears after some steps
of key displacements due to collisions.1
1 Practically, an upper bound logarithmic in n is defined for the number of tries and when this is exceeded, the tables
are rehashed with new hash functions or table sizes.
Übungen zur Vorlesung Algorithmen und Datenstrukturen (D-MATH/CSE), Blatt 5
2
(a) Given two tables of size 5 each and two hash functions h1 (k) = k mod 5 and h2 (k) = bk/5c mod 5.
Insert the keys 27, 2, and 32 in this order into initially empty hash tables.
(b) Find another key such that the insertion leads to an infinite sequence of key displacements. (In
such a case, one would define two new hash functions, allocate two new tables and store the keys
in these tables using the new hash functions.)
Submission link: https://codeboard.ethz.ch/daex05t02
Problem 5.3. Programming Exercise: Finding a Sub-Array
Given two arrays A = (a0 , . . . , an−1 ) and B = (b0 , . . . , bk−1 ) of integer values, implement a function
findOccurrence that returns the position the first occurrence of B in A (or indicates that A does not
contain B, which is, of course, also the case if k > n).
The straight-forward way of doing this is to iterate over the entries of A and compare the sub-array
starting at this position with B (using, e.g., the C++ function std::equals). However, in the worst
case, this results in O(nk) comparison operations, which we consider too slow.
We propose a solution with hashing: For some hash function h, compute a hash h(B) of B, and then,
for every i ∈ {0, . . . , n − k}, compute h((ai , ai+1 , . . . , ai+k−1 )). If this hash is equal to h(B), compare
(ai , ai+1 , . . . , ai+k−1 ) to B element by element (to make sure the hash was not the same accidentally,
which may happen with any hash function) and on success report the position of ai .
Now if we recompute h((ai , ai+1 , . . . , ai+k−1 )) from scratch for every i, this will still take O(nk) time
as before. But we may use “sliding window hash functions” that allow the following: if we know
h((ai , ai+1 , . . . , ai+k−1 )), we can compute h((ai+1 , ai+2 , . . . , ai+k )) in time O(1). One such function
could be just the sum of the elements of the sub-array; it can be updated just by subtracting ai and
adding ai+k , but this is not a very good hash function and can still generate many false collisions.
A better hash function is the following, which depends on two parameters c and m.
Hc,m ((ai , · · · , ai+k−1 )) =
k−1
X
ai+j ck−j−1 mod m .
j=0
The hash for the next window (without ai but with ai+k can be obtained from the previous hash:
Hc,m ((ai+1 , · · · , ai+k )) =
k
X
ai+j ck−j mod m = c · Hc,m ((ai , · · · , ai+k−1 )) + ai+k − ck ai mod m .
j=1
That is, you can update the previous hash by multiplying with c, adding ai+k , and subtracting ck ai ,
with the values taken modulo m. We propose using m = 215 and c = 1021.
Instructions
Write a C++ implementation of an O(n) expected time algorithm for finding a sub-array.
The codeboard project already contains a dummy implementation. It contains a generic function
It1 findOccurrance(const It1 from, const It1 to, const It2 begin, const It2 end) that returns
the first occurrence of the array between begin and end in the array between from and to, or to if
there is no occurrence.
The provided code already reads the input and calls the function.
Make sure that
• the algorithm computes ck only once,
• all computations are modulo m for all values in order not to get an overflow (recall the rules of
modular arithmetic), and
Übungen zur Vorlesung Algorithmen und Datenstrukturen (D-MATH/CSE), Blatt 5
• the values are always positive (e.g., by adding multiples of m).
Submission link: https://codeboard.ethz.ch/daex05t03
3