Chapter 10 Search

Ming Zhang "Data Structures and Algorithms"
Data Structures
and Algorithms(10)
Instructor: Ming Zhang
Textbook Authors: Ming Zhang, Tengjiao Wang and Haiyan Zhao
Higher Education Press, 2008.6 (the "Eleventh Five-Year" national planning textbook)
https://courses.edx.org/courses/PekingX/04830050x/2T2014/
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Chapter 10. Search
•
•
•
•
10.1 Search in a linear list
10.2 Search in a set
10.3 Search in a hash table
Summary
2
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Search in a Hash Table
•
•
•
•
•
•
10.3.0
10.3.1
10.3.2
10.3.3
10.3.4
10.3.5
Basic problems in hash tables
Collisions resolution
Open hashing
Closed hashing
Implementation of closed hashing
Efficiency analysis of hash methods
3
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Open Hashing
{77、14、75、 7、110、62 、95 }
h(Key) = Key % 11
0
1
2
3
4
110
77
14
5
6
7
7
8
9
75
• The empty cells in the table
should be marked by special
values
• like -1 or INFINITY
• Or make the contents of hash
table to be pointers, and the
contents of empty cells are null
pointers
62
95
10
4
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Performance Analysis of Chaining Method
• Give you a table of size M which
contains n records. The hash
function (in the best case) put
records evenly into the M positions
of the table which makes each chain
contains n/M records on the average
• When M>n, the average cost of hash
method is Θ(1)
5
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
10.3.3 Closed Hashing
• d0=h(K) is called the base address of K.
• When a collision occurs, use some method to generate a
sequence of hash addresses for key K
d1, d2, ... di , ..., dm-1
• All the di (0<i<m) are the successive hash addresses
• With different way of probing, we get different ways to resolve
collisions.
• Insertion and search function both assume that the probing
sequence for each key has at least one empty cell
• Otherwise it may get into a endless loop
• We can also limit the length of probing sequence
6
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Problem may Arise - Clustering
• Clustering
• Nodes with different hash addresses
compete for the same successive hash
address
• Small clustering may merge into large
clustering
• Which leads to a very long probing
sequence
7
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Several General Closed Hashing Methods
•
•
•
•
1.
2.
3.
4.
Linear probing
Quadratic probing
Pseudo-random probing
Double hashing
8
Ming Zhang “Data Structures and Algorithms”
Chapter 10
10.3 Search in a Hash Table
Search
目录页
1. Linear probing
• Basic idea:
• If the base address of a record is occupied, check
the next address until an empty cell is found
• Probe the following cells in turn: d+1, d+2, ......, M-1,
0, 1, ......, d-1
• A simple function used for the linear probing:
p(K,i) = I
• Advantages:
• All the cell of the table can be candidate cells for
the new record inserted
9
Ming Zhang “Data Structures and Algorithms”
Chapter 10
10.3 Search in a Hash Table
Search
目录页
Instance of Hash Table
• M = 15, h(key) = key%13
• In the ideal case, all the empty cells in the table should
have a chance to accept the record to be inserted
• The probability of the next record to be inserted at the 11th cell
is 2/15
• The probability to be inserted at the 7th cell is 11/15
0
26
1
2
3
4
5
6
25
41
15
68 44
6
10
7
8
9 22212
10 11
36
Ming Zhang “Data Structures and Algorithms”
12
38
13
12
14
51
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Enhanced Linear Probing
• Every time skip constant c cells rather than 1
• The ith cell of probing sequence is
(h(K) + ic) mod M
• Records with adjacent base address would not get the
same probing sequence
• Probing function is p(K,i) = i*c
• Constant c and M must be co-prime
11
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Example: Enhance Linear Probing
• For instance, c = 2, The keys to be
inserted, k1 and k2. h(k1) = 3, h(k2) = 5
• Probing sequences
• The probing sequence of k1: 3, 5, 7, 9, ...
• The probing sequence of k2: 5, 7, 9, ...
• The probing sequences of k1 and k2 are
still intertwine with each other, which
leads to clustering.
12
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
2. Quadratic probing
• Probing increment sequence: 12, -12, 22,
-22, ..., The address formula is
d2i-1 = (d +i2) % M
d2i = (d – i2) % M
• A function for simple linear probing:
p(K, 2i-1) = i*i
p(K, 2i) = - i*i
13
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Example: Quadratic Probing
• Example: use a table of size M = 13
• Assume for k1 and k2, h(k1)=3, h(k2)=2
• Probing sequences
• The probing sequence of k1: 3, 4, 2, 7, ...
• The probing sequence of k2: 2, 3, 1, 6, ...
• Although k2 would take the base address of
k1 as the second address to probe, but their
probing sequence will separate from each
other just after then
14
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
3. Pseudo-Random Probing
• Probing function p(K,i) = perm[i - 1]
• here perm is an array of length M – 1
• It contains a random permutation of numbers between 1 and M
// generate a pseudo-random permutation of n numbers
void permute(int *array, int n) {
for (int i = 1; i <= n; i ++)
swap(array[i-1], array[Random(i)]);
}
15
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Example: Pseudo-Random Probing
• Example: consider a table of size M = 13,
perm[0] = 2, perm[1] = 3, perm[2] = 7.
• Assume 2 keys k1 and k2, h(k1)=4, h(k2)=2
• Probing sequences
• The probing sequence of k1: 4, 6, 7, 11, ...
• The probing sequence of k2: 2, 4, 5, 9, ...
• Although k2 would take the base address
of k1 as the second address to probe, but
their probing sequence will separate from
each other just after then
16
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Secondary Clustering
• Eliminate the primary clustering
• Probing sequences of keys with different base address
overlap
• Pseudo-random probing and quadratic probing can eliminate
it
• Secondary clustering
• The clustering is caused by two keys which are hashed to one
base address, and have the same probing sequence
• Because the probing sequence is merely a function that
depends on the base address but not the original key.
• Example: pseudo-random probing and quadratic probing
17
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
4. Double Probing
• Avoid secondary clustering
• The probing sequence is a function that depends
on the original key
• Not only depends on the base address
• Double probing
• Use the second hash function as a constant
• p(K, i) = i * h2 (key)
• Probing sequence function
• d = h1(key)
• di = (d + i h2 (key)) % M
18
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Basic ideas of Double Probing
• The double probing uses two hash functions h1 and h2
• If collision occurs at address h1(key) = d, then compute
h2(key), the probing sequence we get is :
(d+h2(key)) % M,(d+2h2 (key)) %M ,(d+3h2 (key)) % M ,...
• It would be better if h2 (key) and M are co-prime
• Makes synonyms that cause collision distributed evenly in the table
• Or it may cause circulation computation of addresses of synonyms
• Advantages: hard to produce “clustering”
• Disadvantages: more computation
19
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Method of choosing M and h2(k)
• Method1: choose a prime M, the return values of h2 is in the
range of
1 ≤ h2(K) ≤ M – 1
• Method2: set M = 2m ,let h2 returns an odd number between 1
and 2m
• Method3: If M is a prime, h1(K) = K mod M
• h2 (K) = K mod(M-2) + 1
• or h2(K) = [K / M] mod (M-2) + 1
• Method4: If M is a arbitrary integer, h1(K) = K mod p (p is the
maximum prime smaller than M)
• h2 (K) = K mod q + 1 (q is the maximum prime smaller than p)
20
Ming Zhang “Data Structures and Algorithms”
Chapter 10
Search
目录页
10.3 Search in a Hash Table
Thinking
• When inserting synonyms, how to
organize synonyms chain?
• What kind of relationship do the
function of double hashing h2 (key) and
h1 (key) have?
21
Ming Zhang “Data Structures and Algorithms”
Ming Zhang “Data Structures and Algorithms”
Data Structures
and Algorithms
Thanks
the National Elaborate Course (Only available for IPs in China)
http://www.jpk.pku.edu.cn/pkujpk/course/sjjg/
Ming Zhang, Tengjiao Wang and Haiyan Zhao
Higher Education Press, 2008.6 (awarded as the "Eleventh Five-Year" national planning textbook)