Chapter 4: Dimensions, Hierarchies, Operations, Modeling Prof. Bayer, DWH, Ch.4, SS 2000 1 Chapter 4.1 Hierarchical Dimensions Def: Hierarchical Dimensions are composite keys with an order on the key attributes. Prefixes are allowed as keys. Ex: dimension Time = ( Year, Month, Day) legal keys are: (Year) or (Year, Month) or (Year, Month, Day) Def: Basic facts are values in cells with full foreign keys Prof. Bayer, DWH, Ch.4, SS 2000 2 Aggregations, Summaries Def: Aggregations are facts in cells with partial keys. These facts are derived by aggregation functions. In a cube with derived facts the aggregation function must be specified. Ex: Sales on a monthly basis Sales (Year, Month) = S Sales (Year, Month, Days) Aggregation Functions: count, sum, avg, min, max, ... Prof. Bayer, DWH, Ch.4, SS 2000 3 Note on Aggregations • Aggregations may be stored explicitely in the cube, but then they should be secured by integrity constraints • Aggregations may be virtual and must be computed on demand when needed • i.e., classical tradeoff between storage space, performance, flexibility Prof. Bayer, DWH, Ch.4, SS 2000 4 Relational Modeling Expand and complete partial key by ALL (Year, Month, ALL) (ALL, Month, ALL) (ALL, ALL, ALL) to obtain simple and complete relational keys via special symbol ALL Question: SQL to compute complete cube with all aggregations from base-cube? Prof. Bayer, DWH, Ch.4, SS 2000 5 Hierarchy Example Prof. Bayer, DWH, Ch.4, SS 2000 6 Chapter 4.2: OLAP Operations Def: Roll-up computes higher aggregations from lower aggregations or base facts according to hierarchies Ex: for base facts (Year, Month, Day) there are 3 roll-up functions: Roll-up (Year, Month, ALL) Roll-up (Year, ALL, ALL) Roll-up (ALL, ALL, ALL) which are supported in general (canonical roll-ups) Prof. Bayer, DWH, Ch.4, SS 2000 7 Additional Roll-ups: (ALL, Month, ALL) etc. therefore 23 -1 aggregations or in general 2m -1 aggregations for m hierarchy levels Note: see later chapters for the support of arbitrary aggregations Note: for m dimensions with h1, h2, ...hm hierarchy levels m there are (h 1) 1 i i 1 different aggregations for a given aggregation function. Prof. Bayer, DWH, Ch.4, SS 2000 8 Size of base cube 2-dim example Dim1: (4, 5) = cardinality of the dimension levels Dim2: (6, 7, 2) (4 5) ( 6 7 2) 20 1680 = Size of base cube 42 84 Prof. Bayer, DWH, Ch.4, SS 2000 9 Size of hierarchically aggregated Cube 4 - 6 7 2 336 4 5 6 7 - 840 - - 6 7 2 84 4 - 6 7 - 168 4 5 6 - - 120 - - 6 7 - 42 4 - 6 - - 24 4 5 - - - 20 - - 6 - - 6 4 - - - - 4 - - - - - 1 Number of cells per aggregation function Prof. Bayer, DWH, Ch.4, SS 2000 1645 10 Size of completely aggregated cube 4 5 6 7 2 0 0 0 0 0 0 | 1 2 | 0 7 | | 14 0 0 0 | | 0 | | 0 : | 0 0 : | 0 0 0 0 | | 0 | | 0 0 | | | | | 24 24 x 6 =144 168 5 x 168 = 840 840 + 168 6 x 168 1008 4 x 1008 = 4032 5 x 1008 = 4032 + 1008 = 5040 Prof. Bayer, DWH, Ch.4, SS 2000 11 Computation with binary Tree 5 4 1 20 4 1 6 6 120 7 1 7 2 28 1 2 4 1 8 4 24 1 24 Prof. Bayer, DWH, Ch.4, SS 2000 2 48 20 40 140 280 120 240 840 1680 2 168 1 28 2 20 1 1 7 56 2 140 1 1 168 120 1 4 7 336 2 840 1 2 24 20 1 1 12 Size of the Cube Lemma: Given a data cube with m dimensions with h1, ..., hm hierarchy levels resp. Let the hierarchy levels of dimension i have ci1 , ci2 ,..., cihi elements resp. Then the base cube has m hi c i j cells i 1 j 1 and the cube with all aggregations has m hi (cij 1) cells i 1 j 1 Prof. Bayer, DWH, Ch.4, SS 2000 13 Size of the Cube (2) The aggregated cube is larger than the base cube by the factor m cij 1 ( j ) ci j 1 hi i 1 Prof. Bayer, DWH, Ch.4, SS 2000 14 Size of the hierarchically aggregated Cube For a hierarchy i with hi levels and ci1 , ci2 ,..., cihi elements per level, there are 1 c c * c ... c * c 1 i 1 i 2 i 1 i hi i hierarchical aggregation possibilities , i.e. 1 hi j ( j 1 k ci ) , possibilities k 1 Lemma: A hierarchically completely aggregated data cube has h j m 1 i 1 ( i j 1 k 1 cik ) cells Prof. Bayer, DWH, Ch.4, SS 2000 15 Ex: (4 5) (6 7 2) size of the hierarchically aggregated cube plus base cube = (1 + 4 + 20) * (1 + 6 + 42 + 84) = 25 * 133 Ex: = 3325 (4 5) (6 7 2) size of base cube: ( 8 3) 40,320 hierarchically aggregated cube plus base: = (1 + 4 + 20) * (1 + 6 + 42 + 84) * (1 + 8 + 24) = 3325 * 33 = 109,725 Prof. Bayer, DWH, Ch.4, SS 2000 16 Ex: (4 5) (6 7 2) size of base cube: ( 8 3) (5 9) 1 814,400 hierarchically aggregated cube plus base: = 109,725 * (1 + 5 + 45) = 5 595,975 Prof. Bayer, DWH, Ch.4, SS 2000 17 Additional comments on aggregations 1. In addition to the size of the complete cube there is a factor of 5 for the various aggregation functions, e.g. sum, avg, min, max, count, ... 2. So far we did not consider general restrictions, e.g. „all Saturdays in March“ or „vacation months July and August“, which cross bounds of hierarchy levels Interactive query formulation results in an unlimited number of aggregations Optimization: restrictions corresponding to hierarchy levels shoud be pushed down, since they lead to query boxes Prof. Bayer, DWH, Ch.4, SS 2000 18 Note: See later chapters for multidimensional indexes and MHC techniques and optimization of ROLAP-algebra to support hierarchical canonical aggregations like Roll-up (Year, Month, ALL) Roll-up (Year, ALL, ALL) Roll-up (ALL, ALL, ALL) but not Roll-up ( ALL, Month, ALL) Prof. Bayer, DWH, Ch.4, SS 2000 19 Optimization Problem Non-hierarchical aggregation, e.g. March for all years decompose into union of several restrictions, e.g. S Sales (Year, Month, Day) where Month = March and (Year = 1996 or Year = 1997 or Year = 1998) see later for translation into ROLAP expression and transformations for optimization Prof. Bayer, DWH, Ch.4, SS 2000 20 Multiple Hierarchies e.g. the time hierarchy Aggregation for month e.g. by covering QB of weeks and postfiltering Prof. Bayer, DWH, Ch.4, SS 2000 21 Navigation Operations Drill Down: first show single result for aggregated value, e.g. sales per day, then show: hourly values for days with very high or very low sales in order to plan working hours for sales people better Other Examples: daily sales during Christmas season vacation bookings for skiing on fasching Prof. Bayer, DWH, Ch.4, SS 2000 22 Roll-up: Compute Aggregations Prof. Bayer, DWH, Ch.4, SS 2000 23 Slicing Selection of a smaller data cube or even reduction of a multidimensional datacube to fewer dimensions by a point restriction in some dimension (becomes pivot element) Region Haidhausen Schwabing 47 11 8 Mai 53 9 14 Zeit Zentrum 77 26 Haidhausen Schwabing Zentrum Racer Future TriaRacer RacerJunior 47 11 8 53 9 14 77 26 15 Juli Juni 15 Mai Racer TriaFuture Racer Racer Junior Produkt Prof. Bayer, DWH, Ch.4, SS 2000 24 Dicing (würfeln) rotate result, to show another view, e.g. exchanging rows and columns Haidhausen Schwabing Zentrum Racer Future TriaRacer RacerJunior 47 53 77 11 9 26 8 14 15 Slice management precomputing and caching of several slices for later or special use, e.g. for a special sales person Prof. Bayer, DWH, Ch.4, SS 2000 25 Chapter 4.3 Modeling Purpose: analysis of business processes, characteristic facts (Kennzahlen) for managers to support decisions (DSS) Steps of Decision Process: 1. Which business processes to model and analyze? 2. What are the measures, where do they come from? 3. Which degree of details, e.g. minutes like in SAP? Which precision is required for OLAP? 4. Common properties of measures to determine dimensions? Brand, Time, geogr. Region, Productgroup? Dependencies between levels of hierarchies? Prof. Bayer, DWH, Ch.4, SS 2000 26 5. Attributes of dimensions, e.g. • screen size of TV • cc and PS for cars • focal length for camera Problem: how common are properties and dimensions? Non common properties cannot be modeled by levels of dimensions, are called features at GfK (up to 50), are numbered with meaning dependent on specific dimension element, e.g. TV: screen size color audio system Car: transmission cc PS #cyl Prof. Bayer, DWH, Ch.4, SS 2000 ... 27 6. Constant or changing attributes of dimensions? E.g. • New models of car makers • new powersource: electrical, hydrogen, solar attributes are rather stable, but still should be planned ahead! (mergers like Daimler-Crysler) 7. Sparsity: one hypercube or several, i.e. multicube model? Influences storage requirements, query formulation and performance, cannot be hidden easily from user, maybe by views? Prof. Bayer, DWH, Ch.4, SS 2000 28 Time 8. Caching and management of aggregates? Total costs Maintenance costs Avg. Response time 0% Optimal Number of aggregates 100% Number of aggregates Prof. Bayer, DWH, Ch.4, SS 2000 29 Chapter 4.4 Comparison of OLAP Architectures 1. MOLAP: Multidimensional OLAP 2. ROLAP: Relational OLAP 3. HOLAP: Hybrid OLAP Prof. Bayer, DWH, Ch.4, SS 2000 30 MOLAP Architecture Benutzer Benutzer Benutzer Data Marts MDDBMS Data Warehouse Datenbank (relational) Prof. Bayer, DWH, Ch.4, SS 2000 31 MDDBMS in ANSI-X3-Sparc Externe Ebene Konzeptuelle Ebene Interne Ebene Dimensionen mit Dimensionselementen 7 11 23 55 Individuelle Submodelle Hierarchien Prof. Bayer, DWH, Ch.4, SS 2000 Daten- / Speicherstrukturen 32 Logical components of a MDDBMS Prof. Bayer, DWH, Ch.4, SS 2000 33 ROLAP Architecture Benutzer ROLAP Produkt Benutzer ROLAP Produkt Benutzer ROLAP Produkt relationale Data Marts Data Warehouse Datenbank (relational) Prof. Bayer, DWH, Ch.4, SS 2000 34 HOLAP Architecture Benutzer HOLAP Produkt Data Warehouse Datenbank (relational) Data Warehouse Datenbank (multidimensional) Prof. Bayer, DWH, Ch.4, SS 2000 35 Reasons for MOLAP • performance • write access • Data Marts • functional power Reasons for ROLAP • scalability • flexible precomputations, partial aggregates • parallelism • DB-mamagement and ACID Prof. Bayer, DWH, Ch.4, SS 2000 36
© Copyright 2026 Paperzz