Matakuliah : M0584 - Data Warehouse Tahun : Sep - 2009 The Data Warehouse and Design Pertemuan 3-4 Summary • The design of the data warehouse begins with the data model • The primary concern of the data warehouse developer is managing volume • The data warehouse is fed data as it passes from the legacy operational environment. Data goes through a complex process of conversion, reformatting, and integration as it passes from the legacy operational environment into the data warehouse environment • The data model exist at three levels – high level, mid level, and low level Bina Nusantara University 3 Summary • The creation of a data warehouse record is triggered by an activity or an event that has occurred in the operational environment • A profile record is a composite record made up of many different historical activities. • The star join is a database design technique that is sometimes mistakenly applied to the data warehouse environment Bina Nusantara University 4 Beginning with Operational Data • Three types of loads are made into the data warehouse from the operational environment: – Archival data – Data currently contained in the operational environment – Ongoing changes to the data warehouse environment from the changes (updates)that have occurred in the operational environment since the last refresh Bina Nusantara University 5 Beginning with Operational Data (cont’d) • Five common techniques are used to limit the amount of operational data scanned 1. 2. 3. 4. 5. Scan data that has been timestamped Scan a ‘delta’ file Scan a log file or an audit file Modify application code Rubbing a ‘before’ and an ‘after’ image of the operational file together Bina Nusantara University 6 Data/Process Model and the Architected Environment • The process model applies only to the operational environment • The data model applies to both the operational environment and the data warehouse environment • A process model typically consists of the following (in whole or in part) – Functional decomposition – Context-level zero diagram – Data Flow Diagram – Structure Chart – State Transition Diagram – HIPO chart – Pseudocode Bina Nusantara University 7 The Data Warehouse and Data Models Bina Nusantara University 8 Bina Nusantara University 9 Bina Nusantara University 10 The Data Warehouse data model • There are three levels of data modeling – High-level modeling (ERD) – Middle level modeling (DIS=Data Item Set) – Low-level modeling (physical model) Bina Nusantara University 11 Snapshots in the Data Warehouse • Snapshots are created as a result of some event occuring. • The snapshot triggered by an event has four basic components: – A key – A unit of time – Primary data that relates only to the key – Secondary data captured as part of the snapshot process that has no direct relationship to the primary data or key Bina Nusantara University 12 Complexity of Transformation and Integration • At first glance, when data is moved from the legacy environment to the data warehouse environment, it appears that nothing more is going on than simple extraction of data from one place to the next Bina Nusantara University 13 Complexity of Transformation and Integration (cont’d) • Some lists of functionality required as data passes from the operational, legacy environment to the data warehouse environment – The extraction of data from operational environment to the data warehouse environment require a change in technology (DBMS technology) – The selection data may be very complex – Operational input keys need to be restructured and converted – Nonkey data is reformatted – Data is cleansed – Multiple input sources of data exist and must be merged – Key resolution must be done – Input files need resequencedd Bina Nusantara University – Default values must be supplied, 14 Profile records • Profile records represent snapshots of data, just like individual activity records. The difference between the two is that individual activity records in the data warehouse represent a single event, while profile records in the data warehouse represent multiple events. • A profile record is created from the grouping of many detailed records • See figure 3.43 for details Bina Nusantara University 15 Managing Volume • In many cases, the volume of data to be managed in the data warehouse is a significant issue. Creating a profile records is an effective technique for managing the volume of data. The reduction of the volume of data possible in moving detailed records in the operational environment into a profile record is remarkable • It is possible (indeed, normal) to achieve a 2-to-3 order of magnitude reduction of data by the creation of profile records in a data warehouse. • Because of this benefit, the ability to create profile records is a powerful one that should be in the portfolio 16 of every data architect Bina Nusantara University Creating Multiple Profile Records • Multiple profile records can be created from the same detail. In the case of a phone company, individual call records can be used to create a customer profile record, a district traffic profile record, a line analysis profile record, and so forth. Bina Nusantara University 17 Creating Multiple Profile Records • The profile records can be used to go into the data warehouse or a data mart that is fed by the data warehouse. When the profile records go into data warehouse, they are for generalpurpose use. When the profile records go into the data mart, they are customized for the department that will uses the data mart. • The aggregation of the operational records into a profile record is almost always done on the 18 operational server. Bina Nusantara University Direct Access of Data Warehouse Data • See figure 3.46 Bina Nusantara University 19 Indirect Access of Data Warehouse Data • See figure 3.47 Bina Nusantara University 20 Star Joins • Data Warehouse design is decidedly a world in which a normalized approach is the proper one. There are several very good reasons why normalization produces the optimal design for a data warehouse: – – – – It produces flexibility It fits well with very granular data It is not optimized for any given set of processing requirement It fits very nicely with the data model Bina Nusantara University 21 Star Joins (cont’d) • A different approach to a database design sometimes mentioned in the context of data warehousing is the multidimensional approach. This approach entails star joins, fact tables, and dimensions. The multidimensional approach applies exclusively to data marts, not data warehouse. • Unlike data warehouse, data marts are very much shaped by requirements. To build a data mart, you have to know a lot about the processing requirements that surround the data mart. • Once those requirements are known, the data mart can 22 be shaped into an optimal star join structure. Bina Nusantara University Star Joins (cont’d) • Data Warehouses are essentially different because they serve a very large community, and as such, they are not optimized for the convenience or performance of any one set of requirements. • Data Warehouses are shaped around the corporate requirements for information, not the departmental requirements for information. • Therefore, creating a star join for the data warehouse is a mistake because the end result will be a data warehouse optimized for one community at the expense 23 of all other community. Bina Nusantara University Star Joins (cont’d) • See Figure 3.51 Bina Nusantara University 24 Star Joins (cont’d) • See Figure 3.52 Bina Nusantara University 25 Star Joins (cont’d) • See Figure 3.53 Bina Nusantara University 26 Star Joins (cont’d) • See Figure 3.54 Bina Nusantara University 27 Star Joins (cont’d) • See Figure 3.55 Bina Nusantara University 28 Star Joins (cont’d) • See Figure 3.56 Bina Nusantara University 29 Supporting the ODS • In general, there are three classes of ODS – class I, class II, and class III. • In a class I ODS, updates of data from the operational environment to the ODS are synchronous. • In class II ODS, the updates between the operational environment and the ODS occur within a two-to-three-hour time frame. 30 Bina Nusantara University Supporting the ODS • And in class III ODS, the synchronization of updates between the operational environment and the ODS occurs overnight. • But there is another type of ODS structure – a class IV ODS, in which updates into the ODS from the data warehouse are unscheduled. • See Figure 3.57 31 Bina Nusantara University
© Copyright 2026 Paperzz