Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Chapter 4: Dimensions, Hierarchies,
Operations, Modeling
Prof. Bayer, DWH, Ch.4, SS 2000
1
Chapter 4.1 Hierarchical Dimensions
Def: Hierarchical Dimensions are composite keys with
an order on the key attributes. Prefixes are allowed as keys.
Ex:
dimension Time = ( Year, Month, Day)
legal keys are:
(Year)
or
(Year, Month)
or
(Year, Month, Day)
Def: Basic facts are values in cells with full foreign keys
Prof. Bayer, DWH, Ch.4, SS 2000
2
Aggregations, Summaries
Def: Aggregations are facts in cells with partial keys. These
facts are derived by aggregation functions. In a cube with
derived facts the aggregation function must be specified.
Ex: Sales on a monthly basis
Sales (Year, Month) = S Sales (Year, Month, Days)
Aggregation Functions: count, sum, avg, min, max, ...
Prof. Bayer, DWH, Ch.4, SS 2000
3
Note on Aggregations
• Aggregations may be stored explicitely in the
cube, but then they should be secured by integrity
constraints
• Aggregations may be virtual and must be
computed on demand when needed
• i.e., classical tradeoff between storage space,
performance, flexibility
Prof. Bayer, DWH, Ch.4, SS 2000
4
Relational Modeling
Expand and complete partial key by ALL
(Year, Month, ALL)
(ALL, Month, ALL)
(ALL, ALL, ALL)
to obtain simple and complete relational keys via special
symbol ALL
Question: SQL to compute complete cube with all
aggregations from base-cube?
Prof. Bayer, DWH, Ch.4, SS 2000
5
Hierarchy Example
Prof. Bayer, DWH, Ch.4, SS 2000
6
Chapter 4.2: OLAP Operations
Def: Roll-up computes higher aggregations from lower
aggregations or base facts according to hierarchies
Ex: for base facts (Year, Month, Day) there are 3 roll-up
functions:
Roll-up (Year, Month, ALL)
Roll-up (Year, ALL, ALL)
Roll-up (ALL, ALL, ALL)
which are supported in general (canonical roll-ups)
Prof. Bayer, DWH, Ch.4, SS 2000
7
Additional Roll-ups:
(ALL, Month, ALL) etc.
therefore
23 -1 aggregations or in general
2m -1 aggregations
for m hierarchy levels
Note: see later chapters for the support of arbitrary
aggregations
Note: for m dimensions with h1, h2, ...hm hierarchy levels
m
there are
 (h  1)  1
i
i 1
different aggregations for a given aggregation function.
Prof. Bayer, DWH, Ch.4, SS 2000
8
Size of base cube
2-dim example
Dim1: (4, 5)
= cardinality of the dimension levels
Dim2: (6, 7, 2)
(4 5) ( 6 7 2)
20
1680 = Size of base cube
42
84
Prof. Bayer, DWH, Ch.4, SS 2000
9
Size of hierarchically aggregated Cube
4
-
6
7
2
336
4
5
6
7
-
840
-
-
6
7
2
84
4
-
6
7
-
168
4
5
6
-
-
120
-
-
6
7
-
42
4
-
6
-
-
24
4
5
-
-
-
20
-
-
6
-
-
6
4
-
-
-
-
4
-
-
-
-
-
1
Number of cells per aggregation function
Prof. Bayer, DWH, Ch.4, SS 2000
1645
10
Size of completely aggregated cube
4
5
6
7
2
0
0
0
0
0
0
|
1
2
|
0
7
|
|
14
0
0
0
|
|
0
|
|
0
:
|
0
0
:
|
0
0
0
0
|
|
0
|
|
0
0
|
|
|
|
|
24
24 x 6 =144
168
5 x 168 = 840
840 + 168
6 x 168
1008
4 x 1008 = 4032
5 x 1008 = 4032 + 1008 = 5040
Prof. Bayer, DWH, Ch.4, SS 2000
11
Computation with binary Tree
5
4
1
20
4
1
6
6
120
7
1
7
2
28
1
2
4
1
8
4
24
1
24
Prof. Bayer, DWH, Ch.4, SS 2000
2
48
20
40
140
280
120
240
840
1680
2
168
1
28
2
20
1
1
7
56
2
140
1
1
168
120
1
4
7
336
2
840
1 2
24
20
1
1
12
Size of the Cube
Lemma: Given a data cube with m dimensions with
h1, ..., hm hierarchy levels resp. Let the hierarchy levels of
dimension i have
ci1 , ci2 ,..., cihi elements resp.
Then the base cube has
m
hi
 c
i
j
cells
i 1 j 1
and the cube with all aggregations has
m
hi

(cij  1) cells
i 1 j 1
Prof. Bayer, DWH, Ch.4, SS 2000
13
Size of the Cube (2)
The aggregated cube is larger than the base cube by the
factor
m
cij  1
( j )
ci
j 1
hi

i 1
Prof. Bayer, DWH, Ch.4, SS 2000
14
Size of the hierarchically aggregated Cube
For a hierarchy i with hi levels and ci1 , ci2 ,..., cihi elements per level,
there are
1  c  c * c  ...  c * c
1
i
1
i
2
i
1
i
hi
i
hierarchical aggregation possibilities , i.e.
1
hi
j
 (
j 1
k
ci ) , possibilities
k 1
Lemma: A hierarchically completely aggregated data cube
has
h
j
m

1 

i 1 

 (
i
j 1
k 1



cik )  cells
Prof. Bayer, DWH, Ch.4, SS 2000
15
Ex:
(4 5)
(6 7 2)
size of the hierarchically aggregated cube plus base cube
= (1 + 4 + 20) * (1 + 6 + 42 + 84)
= 25 * 133
Ex:
= 3325
(4 5)
(6 7 2)
size of base cube:
( 8 3)
40,320
hierarchically aggregated cube plus base:
= (1 + 4 + 20) * (1 + 6 + 42 + 84) * (1 + 8 + 24)
= 3325 * 33
= 109,725
Prof. Bayer, DWH, Ch.4, SS 2000
16
Ex:
(4 5)
(6 7 2)
size of base cube:
( 8 3)
(5 9)
1 814,400
hierarchically aggregated cube plus base:
= 109,725 * (1 + 5 + 45) = 5 595,975
Prof. Bayer, DWH, Ch.4, SS 2000
17
Additional comments on aggregations
1. In addition to the size of the complete cube there is a
factor of 5 for the various aggregation functions, e.g.
sum, avg, min, max, count, ...
2. So far we did not consider general restrictions, e.g.
„all Saturdays in March“ or „vacation months July
and August“, which cross bounds of hierarchy levels
Interactive query formulation results in an
unlimited number of aggregations
Optimization: restrictions corresponding to
hierarchy levels shoud be pushed down,
since they lead to query boxes
Prof. Bayer, DWH, Ch.4, SS 2000
18
Note: See later chapters for multidimensional indexes and
MHC techniques and optimization of ROLAP-algebra to
support hierarchical canonical aggregations like
Roll-up (Year, Month, ALL)
Roll-up (Year, ALL, ALL)
Roll-up (ALL, ALL, ALL)
but not
Roll-up ( ALL, Month, ALL)
Prof. Bayer, DWH, Ch.4, SS 2000
19
Optimization Problem
Non-hierarchical aggregation, e.g.
March for all years
decompose into union of several restrictions, e.g.
S Sales (Year, Month, Day)
where Month = March and
(Year = 1996 or Year = 1997 or Year = 1998)
see later for translation into ROLAP expression and
transformations for optimization
Prof. Bayer, DWH, Ch.4, SS 2000
20
Multiple Hierarchies
e.g. the time hierarchy
Aggregation for month e.g. by covering QB of weeks
and postfiltering
Prof. Bayer, DWH, Ch.4, SS 2000
21
Navigation Operations
Drill Down: first show single result for aggregated
value, e.g. sales per day, then show:
hourly values for days with very high or very low sales
in order to plan working hours for sales people better
Other Examples:
daily sales during Christmas season
vacation bookings for skiing on fasching
Prof. Bayer, DWH, Ch.4, SS 2000
22
Roll-up: Compute Aggregations
Prof. Bayer, DWH, Ch.4, SS 2000
23
Slicing
Selection of a smaller data cube or even reduction of a
multidimensional datacube to fewer dimensions by a point
restriction in some dimension (becomes pivot element)
Region
Haidhausen
Schwabing
47
11
8
Mai
53
9
14
Zeit
Zentrum
77
26
Haidhausen
Schwabing
Zentrum
Racer
Future
TriaRacer
RacerJunior
47
11
8
53
9
14
77
26
15
Juli
Juni
15
Mai
Racer TriaFuture Racer
Racer
Junior
Produkt
Prof. Bayer, DWH, Ch.4, SS 2000
24
Dicing (würfeln)
rotate result, to show another view, e.g. exchanging
rows and columns
Haidhausen Schwabing Zentrum
Racer
Future
TriaRacer
RacerJunior
47
53
77
11
9
26
8
14
15
Slice management
precomputing and caching of several slices for later or
special use, e.g. for a special sales person
Prof. Bayer, DWH, Ch.4, SS 2000
25
Chapter 4.3 Modeling
Purpose: analysis of business processes, characteristic facts
(Kennzahlen) for managers to support decisions (DSS)
Steps of Decision Process:
1. Which business processes to model and analyze?
2. What are the measures, where do they come from?
3. Which degree of details, e.g. minutes like in SAP? Which
precision is required for OLAP?
4. Common properties of measures to determine dimensions?
Brand, Time, geogr. Region, Productgroup? Dependencies
between levels of hierarchies?
Prof. Bayer, DWH, Ch.4, SS 2000
26
5. Attributes of dimensions, e.g.
• screen size of TV
• cc and PS for cars
• focal length for camera
Problem: how common are properties and dimensions? Non
common properties cannot be modeled by levels of
dimensions, are called features at GfK (up to 50), are
numbered with meaning dependent on specific dimension
element, e.g.
TV:
screen size
color audio system
Car:
transmission cc
PS
#cyl
Prof. Bayer, DWH, Ch.4, SS 2000
...
27
6. Constant or changing attributes of dimensions? E.g.
• New models of car makers
• new powersource: electrical, hydrogen, solar
attributes are rather stable, but still should be planned
ahead! (mergers like Daimler-Crysler)
7. Sparsity: one hypercube or several, i.e. multicube model?
Influences storage requirements, query formulation and
performance, cannot be hidden easily from user, maybe by
views?
Prof. Bayer, DWH, Ch.4, SS 2000
28
Time
8. Caching and management of aggregates?
Total costs
Maintenance costs
Avg.
Response time
0%
Optimal
Number of
aggregates
100%
Number of aggregates
Prof. Bayer, DWH, Ch.4, SS 2000
29
Chapter 4.4 Comparison of OLAP
Architectures
1. MOLAP: Multidimensional OLAP
2. ROLAP: Relational OLAP
3. HOLAP: Hybrid OLAP
Prof. Bayer, DWH, Ch.4, SS 2000
30
MOLAP Architecture
Benutzer
Benutzer
Benutzer
Data Marts
MDDBMS
Data Warehouse
Datenbank
(relational)
Prof. Bayer, DWH, Ch.4, SS 2000
31
MDDBMS in ANSI-X3-Sparc
Externe Ebene
Konzeptuelle Ebene
Interne Ebene
Dimensionen mit
Dimensionselementen
7
11
23
55
Individuelle Submodelle
Hierarchien
Prof. Bayer, DWH, Ch.4, SS 2000
Daten- /
Speicherstrukturen
32
Logical components of a MDDBMS
Prof. Bayer, DWH, Ch.4, SS 2000
33
ROLAP Architecture
Benutzer
ROLAP
Produkt
Benutzer
ROLAP
Produkt
Benutzer
ROLAP
Produkt
relationale
Data Marts
Data Warehouse
Datenbank
(relational)
Prof. Bayer, DWH, Ch.4, SS 2000
34
HOLAP Architecture
Benutzer
HOLAP
Produkt
Data Warehouse
Datenbank
(relational)
Data Warehouse
Datenbank
(multidimensional)
Prof. Bayer, DWH, Ch.4, SS 2000
35
Reasons for MOLAP
• performance
• write access
• Data Marts
• functional power
Reasons for ROLAP
• scalability
• flexible precomputations, partial aggregates
• parallelism
• DB-mamagement and ACID
Prof. Bayer, DWH, Ch.4, SS 2000
36