Hashing, Bitmap Indexing

Chapter 11: Indexing and
Hashing

Indexing
Basic Concepts
 Ordered Indices
 B+-Tree Index Files


Hashing
Static
 Dynamic Hashing

1
Hashing


Static hashing
Dynamic hashing
2
Hashing
key  h(key)
<key>
.
.
.
Buckets
(typically 1
disk block)
3
Example hash function



Key = ‘x1 x2 … xn’ n byte character string
Have b buckets
h: add x1 + x2 + ….. xn

compute sum modulo b
4
 This may not be best function …
Good hash  Expected number of
function:
keys/bucket is the
same for all buckets
5
Within a bucket:


Do we keep keys sorted?
Yes, if CPU time critical
& Inserts/Deletes not too frequent
6
Next: example to illustrate
inserts, overflows, deletes
h(K)
7
EXAMPLE 2 records/bucket
INSERT:
h(a) = 1
h(b) = 2
h(c) = 1
h(d) = 0
0
d
1
a
c
2
b
e
3
h(e) = 1
8
EXAMPLE: deletion
Delete:
e
f
c
0
a
1
b
c d
e
2
3
f
g
d
maybe move
“g” up
9
Rule of thumb:



Try to keep space utilization
between 50% and 80%
Utilization = # keys used
total # keys that fit
If < 50%, wasting space
If > 80%, overflows significant
depends on how good hash
function is & on # keys/bucket
10
How do we cope with growth?


Overflows and reorganizations
Dynamic hashing


Extensible
Linear
11
Extensible hashing: two ideas
(a) Use i of b bits output by hash function
b
h(K) 
00110101
use i  grows over time….
12
(b) Use directory
h(K)[i ]
.
.
.
to bucket
.
.
.
13
Example: h(k) is 4 bits; 2 keys/bucket
i= 1
1
0001
00
01
1 2
1001
1010 1100
Insert 1010
i= 2
1 2
1100
10
11
New directory
14
Example continued
i= 2
00
01
10
11
Insert:
0111
2
0000
0001
1 2
0001 0111
0111
2
1001
1010
2
1100
0000
15
Example continued
0000 2
i= 2
0001
00
0111 2
01
10
11
Insert:
1001
1001 3
1001
1010 1001 2 3
1010
1100 2
i=3
000
001
010
011
100
101
110
111
16
Extensible hashing: deletion


No merging of blocks
Merge blocks
and cut directory if possible
(Reverse insert procedure)
17
Deletion example:

Run thru insert example in reverse!
18
Summary
Extensible hashing
+ Can handle growing files
- with less wasted space
- with no full reorganizations
- Indirection
(Not bad if directory in memory)
-
Directory doubles in size
(Now it fits, now it does not)
19
Advanced indexing


Multiple attributes
Bitmap indexing
20
Multiple-Key Access


Use multiple indices for certain types of
queries.
Example:
select account-number
from account
where branch-name = “Perryridge” and balance =
1000

Possible strategies?
21
Indices on Multiple Attributes

where branch-name = “PP” and balance =
1000 Suppose we have an index on combined search-key
PP,1500
PP,1560
CC,200
PP,800
PP,1500
PP,800
PP,1000
PP,1300
CC,200
PP,300
CC,200
DD,200
DD,300
AB,200
AC,200
AA,2000
AA,2300
AA,2500
AB,200
BB,1000
(branch-name, balance).
22
Suppose we have an index on combined search-key
(branch-name, balance).

where branch-name = “PP” and balance <
1000
search pp,1000
PP,1500
PP,1560
CC,200
PP,800
PP,1500
PP,800
PP,1000
PP,1300
CC,200
PP,300
CC,200
DD,200
DD,300
AB,200
AC,200
AA,2000
AA,2300
AA,2500
AB,200
BB,1000
search pp,0
23
PP,1500
PP,1560
PP,800
PP,1000
PP,1300
CC,200
PP,300
CC,200
DD,200
DD,300
CC,200
PP,800
PP,1500
BB,1000
AB,200

AB,200
AC,200
AA,2000
AA,2300
AA,2500
Suppose we have an index on combined search-key
(branch-name, balance).
NO!
where branch-name < “PP” and balance =
1000?
24
Bitmap Indices

An index designed for multiple valued search
keys
25
Bitmap Indices (Cont.)
The income-level value of record 3 is L1
Bitmap(size = table size)
Unique values
of gender
Unique values
of income-level
26
Bitmap Indices (Cont.)

Some properties of bitmap indices



Number of bitmaps for each attribute?
Size of each bitmap?
When is the bitmap matrix sparse and what attributes are good
for bitmap indices?
27
Bitmap Indices (Cont.)

Bitmap indices generally very small compared
with relation size




E.g. if record is 100 bytes, space for a single bitmap
is 1/800 of space used by relation.
If number of distinct attribute values is 8, bitmap is
only 1% of relation size
What about insertion?
Deletion?
28
Bitmap Indices Queries
Sample query: Males with income level L1
10010 AND 10100 = 10000
even faster!
What about the number of males with income level L1?
29
Bitmap Indices Queries

Queries are answered using bitmap operations



Intersection (and)
Union (or)
Complementation (not)
30