Direct address table Collisions Analysis of Chaining fi Successful

Lecture 8, Thursday 4/26/01
Hashing (Chapter 12)
O
Heaps support:
»
»
»
O
O
Direct address table
O
Insert
Delete
Max/min
Ï
Ó
How about Search ?
(How would you implement “find” in a heap ?)
Œ
=
if x T , key( x) i
otherwise
O
Disadvantage: too much memory !
O
Idea: maintain small table:
Possible solutions:
»
»
Maintain table T[i]:T[i] = ÔÌÔ∆x
Small
table
Universe
of keys
Ordered array: slow “insert” - Ω (n).
Ordered list: both find and insert are slow !
74
75
Collisions
O
O
Analysis of Chaining
| Table | << | Universe of keys | - collisions !
(collision= two keys map into the same slot in T)
How to resolve collisions:
»
»
Chaining:
What is “next” ?
Open addressing: if A[h(x)] full - try next slot.
O
Assume each key equally likely hashed to any slot.
O
n keys, m slots;
O
Expected length of a chain:
O
fi
α = n = "load factor"
m
∑ m1 = mn = α
n
1
Access time = O(1+a )
Unsuccessful search:
Expected length of a randomly chosen list +1:
O(1+a )
76
77
Successful Search
O
O
O
Open Addressing
Expected time to find -th element = time to insert -th
element
i
i
If A[h(x)] full, try “next” slot.
O
Linear probing:
Assume that the key being searched for is equally likely to be
any one of the keys stored.
»
»
i
overall:
»

i −1 
1 +


m 

1 n ÊÁ1+ i -1ˆ˜ =1+ 1 n Êi -1ˆ
˜
¯
m ¯˜
n i=1 ËÁ
nm i =1 ÁË
Â
pick some integer b relatively prime to size of table m.
For i = 0, 1, 2, 3, … try to place x in position:
h(x) + b·i mod m
Conditioned on “key was the -th element inserted”,
expected time =
Â
O
»
Intuition: need to search 1/2 of a list on the average.
Bad idea: results in large clusters.
Increased search time and insert time as
α→ 1.
Double hashing: works well in practice.
»
1 n(n -1) =1+ a - 1 = O(1+a )
= 1+ nm
2
2 2nm
O
O
Pick two hash functions h1, h2
For i = 0, 1, 2, 3,… try to place x in position:
h1(x) + i·h2(x) mod m
78
79
1
Analysis of Open Addressing
More open addressing
O
O
O
Simplifying assumption: h(key, probe #), random and
uniform.
Probability that at least i probes lead to already
occupied slots ?
i
Ê nˆ
O
Expected # probes in unsuccessful search:
O
qi = Á
˜
Ë m¯
O
1+
=ai
exactly
probes3] = 1 + ∑q = 1 + ∑α =
∑i 1Pr[4444
2i4444
∞
∞
∞
i
i =1
p1
Why ?
i=1
pi
p2
p2
p1 2 p2
p3
p3
p3
3 p3
i
i =1
L q
L q
L q
L ∑ip = ∑ q
What about successful search ?
Depends on the element: element inserted earlier will be
easier to find !
Assume uniform distribution on the element we search for.
If element was inserted at (i-1)-th step, expected number
of probes was ≤ 1 i = mm− i
1−
m
Condition on i, take expectation:
n-1
n -1
i=0
i =0
£ n1  mm- i = mn  m1- i = a1 ÈÎ H m - H m-n ˘˚
1
1−α
But: ln i £ Hi £ ln i + 1
Thus expected #probes :
1
2
3
i
i
80
È
˘
£ 1 ÈÎln m + 1- ln(m - n)˘˚ = 1 ÍÈ ln m +1˙˘ = 1 Í ln 1 +1˙
a
a Î m - n ˚ a ÎÍ 1- a ˚˙
81
2