lect 5

ITCS 6163
Lecture 5
Indexing datacubes
Objective: speed queries up.
Traditional databases (OLTP): B-Trees
• Time and space logarithmic to the amount of indexed keys.
• Dynamic, stable and exhibit good performance under
updates. (But OLAP is not about updates….)
Bitmaps:
• Space efficient
• Difficult to update (but we don’t care in DW).
• Can effectively prune searches before looking at data.
Bitmaps
R = (…., A,….., M)
R (A)
B8 B7
B6 B5
B4
B3
B2
B1 B0
3
2
1
2
8
2
2
0
7
5
6
4
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
1
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
Query optimization
Consider a high-selectivity-factor query with predicates on two
attributes.
Query optimizer: builds plans
(P1) Full relation scan (filter as you go).
(P2) Index scan on the predicate with lower selectivity
factor, followed by temporary relation scan, to filter out nonqualifying tuples, using the other predicate. (Works well if data
is clustered on the first index key).
(P3) Index scan for each predicate (separately),
followed by merge of RID.
Query optimization (continued)
Index
Pred1
(P2)
(P3)
Blocks of data
t1
Tuple
list1
tn
Merged
list
Pred. 2
Index
Pred2
t1
tn
Tuple
list2
answer
Query optimization (continued)
When using bitmap indexes (P3) can be an easy winner!
CPU operations in bitmaps (AND, OR, XOR, etc.) are more
efficient than regular RID merges: just apply the binary
operations to the bitmaps
(In B-trees, you would have to scan the two lists and select
tuples in both -- merge operation--)
Of course, you can build B-trees on the compound key, but
we would need one for every compound predicate
(exponential number of trees…).
Bitmaps and predicates
A = a1 AND B = b2
Bitmap for a1
AND
Bitmap for b2
=
Bitmap for a1 and b2
Tradeoffs
Dimension cardinality small
dense bitmaps
Dimension cardinality large
sparse bitmaps
Compression
(decompression)
Query strategy for Star joins
Maintain join indexes between fact table and dimension tables
Fact table
Dimension table
Product Type Location
Prod.

a
...
k
…

Bitmap
for loc.
type
aprod


…..

…

Bitmap
for loc.
type
kprod 

Strategy example
Aggregate all sales for products of location , or 
Bitmap for 
OR
Bitmap for  Bitmap for 
OR
=
Bitmap for predicate
Star-Joins
Select F.S, D1.A1, D2.A2,
…. Dn.An
from F,D1,D2,Dn
where F.A1 = D1.A1
F.A2 = D2.A2
…
F.An = Dn.An
and
D1.B1 = ‘c1’
D2.B2 = ‘p2’
….
Likely strategy:
For each Di find suitable values
of Ai such that Di.Bi = ‘xi’
(unless you have a bitmap
index for Bi).
Use bitmap index on Ai’ values
to form a bitmap for related
rows of F (OR-ing the bitmaps).
At this stage, you have n such
bitmaps, the result can be found
AND-ing them.
Example
Selectivity/predicate = 0.01 (predicates on the dimension
tables)
n predicates (statistically independent)
Total selectivity = 10 -2n
Facts table = 108 rows, n = 3,
tuples in answer = 108/ 106 = 100 rows.
In the worst case = 100 blocks… Still better
than all the blocks in the relation (e.g., assuming 100
tuples/block, this would be 106 blocks!)
Design Space of Bitmap Indexes
The basic bitmap design is called Value-list index. The focus
there is on the columns.
If we change the focus to the rows, the index becomes a set of
attribute values (integers) in each tuple (row), that can be
represented in a particular way.
5
000100000
We can encode this row in many ways...
Attribute value decomposition
C = attribute cardinality
Consider a value of the attribute, v, and a sequence of numbers
<bn-1, bn-2 , …,b1>.
Also, define bn =  C /  bi  , then v can be decomposed into
a sequence of n digits <vn, vn-1, vn-2 , …,v1> as follows:
v = V1
= V2 b1 + v1
= V3(b2b1) + v2 b1 + v1
… n-1
i-1
= vn ( bj) + …+ vi ( bj) + …+ v2b1 + v1
where vi = Vi mod bi and Vi = Vi-1/bi-1
Number systems
How do you write 576 in:
<<2,2,2,2,2,2,2,2,2>
7,7,5,3> (decimal system!)
<10,10,10>
576/(7x7x5x3)
= 576/735 = 0 |7 576, 576/(7x5x3)=576/105=5|51
576
576 = 15 x 2109 +x 010x 2+87+x010
x 2+ 6+ 1 x 26 + 0 x 25 + 0 x 24 + 0 x 23
2+ 0 x 21 + 0 x 20
+
0
x
2
51/(5x3)
=3|6
6/3 =2 | 0
576/100==51/15
5 | 76
576/ 29 = 1 | 64, 64/ 28 = 0|64, 64/ 27 = 0|64, 64/ 26 = 1|0,
76/10 =
7|6
0/ 25 = 0|0, 0/ 24= 0|0, 0/ 23= 0|0, 0/ 22 = 0|0, 0/ 21 = 0|0,
0/ 20 = 0|0 6
576=
(7x5x3)+51
576
3 (5
576
==555xxx(7x5x3)
(7x
5 x 3)+ +
3 xx(53)x+316
) + 2 x (3)
Bitmaps
R = (…., A,….., M) value-list index
R (A)
B8 B7
B6 B5
B4
B3
B2
B1 B0
3
2
1
2
8
2
2
0
7
5
6
4
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
1
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
Example
sequence <3,3> value-list index (equality)
R (A)
B22 B12 B02
B21
3
2
1
2
8
2
2
0
7
5
6
4
0
0
0
0
1
0
0
0
1
0
1
0
0
1
0
1
1
1
1
0
0
1
0
0
(1x3+0)
1
0
0
0
0
0
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
0
B11 B01
0 1
0 0
1 0
0 0
0 0
0 0
0 0
0 1
1 0
0 0
0 1
1 0
Encoding scheme
Equality encoding: all bits to 0 except the one that corresponds
to the value
Range Encoding: the vi righmost bits to 0, the remaining to 1
Range encoding
single component, base-9
R (A)
B8 B7
B6 B5
B4
B3
B2
B1 B0
3
2
1
8
0
7
5
6
4
1 1
1 1
1 1
1 0
1 1
1 1
1 1
1 1
1 1
1
1
1
0
1
0
1
1
1
1
1
1
0
1
0
0
0
1
1
1
1
0
1
0
0
0
0
0
1
1
0
1
0
0
0
0
0
0
1
0
1
0
0
0
0
1
1
1
0
1
0
1
0
1
0
0
0
0
1
0
0
0
0
Example (revisited)
sequence <3,3> value-list index(Equality)
R (A)
B22 B12 B02
B21
3
2
1
2
8
2
2
0
7
5
6
4
0
0
0
0
1
0
0
0
1
0
1
0
0
1
0
1
1
1
1
0
0
1
0
0
(1x3+0)
1
0
0
0
0
0
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
0
B11 B01
0 1
0 0
1 0
0 0
0 0
0 0
0 0
0 1
1 0
0 0
0 1
1 0
Example
sequence <3,3> range-encoded index
R (A)
B12 B02
B11 B01
3
2
1
2
8
2
2
0
7
5
6
4
1
1
1
1
0
1
1
1
0
1
0
1
1
0
1
0
0
0
0
1
1
0
1
1
0
1
1
1
0
1
1
1
0
0
0
0
1
0
0
0
0
0
0
1
0
0
1
0
Design Space
equality
b
log2C 
b,b,…,b
<b2,b1>
….
range
Value-list
Bit-Sliced
RangeEval
Evaluates each range predicate by computing two bitmaps: BEQ
bitmap and either BGT or BLT
RangeEval-Opt uses only <=
A < v is the same as A <= v-1
A > v is the same as Not( A <= v)
A >= v is the same as Not (A <= v-1)
RangeEval-OPT