Department of Informatics, University of Rijeka Radmile Matejčić 2, 51000 Rijeka, Croatia http://www.inf.uniri.hr Data Warehouse design models in higher education courses Patrizia Poščić, Associate Professor [email protected] Danijela Subotić, Teaching Assistant [email protected] Overview • Introduction • DW architecture • Modeling practices – Entity-relationship model – Data Vault model – Dimensional model • Conclusion 2 Introduction • Selected Topics in Databases • Graduate study, 1st year • Data warehouse (DW) design as a topic • Integrating several data modeling practices for complete DW design • Practical assignment at the end of the semester 3 DW architecture 4 Modeling practices • Modeling of existing database (DB) sources – Entity-relationship model – Relational model • Modeling enterprise data warehouse (EDW) as system of records – Data Vault model • Modeling data marts (DM) – Dimensional model 5 Business case • We use a business case which deals with a DW for the outdoor and adventure equipment sales company • All data model examples (which are shown on following slides) are made in Erwin 9.5 and are based on IDEF1X 6 Entity-Relationship (ER) model Sales DB Marketing DB 7 Data Vault model • A data modeling method that supports design of data warehouses for long-term storage of historical data collected from various data sources • Based on the assumption that the DW environment is in constant change • It highlights the need for tracking the origin of data contained in the database, through empirically defined set of metadata • Enables tracking the value back to the source and tracking the history of changes 8 Data Vault model • There is no difference between good and bad data - all the data is stored at all times, regardless of whether they are adaptable to business rules - avoiding the loss of information • The structural data are explicitly separated from descriptive attributes, regardless of whether they come from the same source • Model flexible to changes in business environment • Allows for a gap analysis and trend projections 9 Data Vault model • Any change is implemented in the model as an independent extension of the existing model: – the changes do not affect current applications – all versions of the application can be based on the same, developing DB – all versions of the model are a subset of the DV model • Enables fast parallel loading which reduces the overall costs • Aiming at flexibility and performance 10 Data Vault model • Hub • Link • Satellite S_CUST_CONTACT S_CUST_NAME customerID (FK) loadDTS custName custCreditLimit loadEndDTS recSource customerID (FK) loadDTS H_CUSTOMER customerID loadDTS recSource L_CST_ORD customerID (FK) orderID (FK) S_ORDER custAddress custZipCode custCity custProvince custCountry custPhone custEmail loadEndDTS recSource loadDTS recSource H_ORDER orderID loadDTS recSource classID (FK) loadDTS classType loadEndDTS recSource orderID (FK) loadDTS orderDate deliveryDate paymentDiscount orderTotal loadEndDTS recSource S_CLASS L_ORD_ORD_CLS orderID (FK) classID (FK) loadDTS recSource H_CLASS classID loadDTS recSource 11 Data Vault model 12 Data Vault model (main advantages) • Inserts, deletes, or updates of rows are implemented only as additions (nothing ever get lost/overwritten) • Structural changes of and in data sources results in model expansion, principally by new links and without structural reconstruction of existing DW elements (architectural stability) • Enables rapid parallel data loads 13 Dimensional model • Practically universally used for DM design presentation • Distinguished by star schema design – centralized fact table, which contains a multi-layered keys and one or more numerical business measures – fact (set of measurement) needs to be tracked for a lowest granularity of data – fact is surrounded with a rich context of dimensions – dimension tables are denormalized, they have a simple key and they store business attributes in the form of textual information 14 Dimensional model D_CUSTOMER D_AGENT customerID agentID nameCust sexCust classCust branchCust city region nameAg typeAg agency masterAg F_ORDER customerID (FK) timeID (FK) agentID (FK) paymentDue DeliveryDue subsegmentID (FK) quantity discount returns cancellations totalNetOrd price F_INVOICE D_TIME timeID date dayOfWeek week month quartal campSeason year campYear customerID (FK) timeID (FK) agentID (FK) subsegmentID (FK) quantity discount price D_PRODUCT subsegmentID subsegment segment subcategory category 15 Conclusion • We presented a set of complementary data warehouse design models which may enable well integrated DW solutions for relational DB implementations • Models based on a common notation (IDEF1X) and in a single design tool (ErWin) • Our goal is to present students with a compact set of modelling knowledge in the field of DB and DW • Upgrade and further develop theoretical knowledge and practical modelling skills through the educational process 16 Thank You for your attention!
© Copyright 2025 Paperzz