ppt

Department of Informatics, University of Rijeka
Radmile Matejčić 2, 51000 Rijeka, Croatia
http://www.inf.uniri.hr
Data Warehouse design models in
higher education courses
Patrizia Poščić, Associate Professor
[email protected]
Danijela Subotić, Teaching Assistant
[email protected]
Overview
• Introduction
• DW architecture
• Modeling practices
– Entity-relationship model
– Data Vault model
– Dimensional model
• Conclusion
2
Introduction
• Selected Topics in Databases
• Graduate study, 1st year
• Data warehouse (DW) design as a topic
• Integrating several data modeling practices for complete
DW design
• Practical assignment at the end of the semester
3
DW architecture
4
Modeling practices
• Modeling of existing database (DB) sources
– Entity-relationship model
– Relational model
• Modeling enterprise data warehouse (EDW) as system
of records
– Data Vault model
• Modeling data marts (DM)
– Dimensional model
5
Business case
• We use a business case which
deals with a DW for the outdoor and
adventure equipment sales
company
• All data model examples (which are
shown on following slides) are
made in Erwin 9.5 and are based
on IDEF1X
6
Entity-Relationship (ER) model
Sales DB
Marketing DB
7
Data Vault model
• A data modeling method that supports design of data
warehouses for long-term storage of historical data
collected from various data sources
• Based on the assumption that the DW environment is in
constant change
• It highlights the need for tracking the origin of data
contained in the database, through empirically defined
set of metadata
• Enables tracking the value back to the source and
tracking the history of changes
8
Data Vault model
• There is no difference between good and bad data - all
the data is stored at all times, regardless of whether they
are adaptable to business rules - avoiding the loss of
information
• The structural data are explicitly separated from
descriptive attributes, regardless of whether they come
from the same source
• Model flexible to changes in business environment
• Allows for a gap analysis and trend projections
9
Data Vault model
• Any change is implemented in the model as an
independent extension of the existing model:
– the changes do not affect current applications
– all versions of the application can be based on the same,
developing DB
– all versions of the model are a subset of the DV model
• Enables fast parallel loading which reduces the overall
costs
• Aiming at flexibility and performance
10
Data Vault model
• Hub
• Link
• Satellite
S_CUST_CONTACT
S_CUST_NAME
customerID (FK)
loadDTS
custName
custCreditLimit
loadEndDTS
recSource
customerID (FK)
loadDTS
H_CUSTOMER
customerID
loadDTS
recSource
L_CST_ORD
customerID (FK)
orderID (FK)
S_ORDER
custAddress
custZipCode
custCity
custProvince
custCountry
custPhone
custEmail
loadEndDTS
recSource
loadDTS
recSource
H_ORDER
orderID
loadDTS
recSource
classID (FK)
loadDTS
classType
loadEndDTS
recSource
orderID (FK)
loadDTS
orderDate
deliveryDate
paymentDiscount
orderTotal
loadEndDTS
recSource
S_CLASS
L_ORD_ORD_CLS
orderID (FK)
classID (FK)
loadDTS
recSource
H_CLASS
classID
loadDTS
recSource
11
Data Vault
model
12
Data Vault model
(main advantages)
• Inserts, deletes, or updates of rows are implemented
only as additions (nothing ever get lost/overwritten)
• Structural changes of and in data sources results in
model expansion, principally by new links and without
structural reconstruction of existing DW elements
(architectural stability)
• Enables rapid parallel data loads
13
Dimensional model
• Practically universally used for DM design presentation
• Distinguished by star schema design
– centralized fact table, which contains a multi-layered keys and
one or more numerical business measures
– fact (set of measurement) needs to be tracked for a lowest
granularity of data
– fact is surrounded with a rich context of dimensions
– dimension tables are denormalized, they have a simple key and
they store business attributes in the form of textual information
14
Dimensional model
D_CUSTOMER
D_AGENT
customerID
agentID
nameCust
sexCust
classCust
branchCust
city
region
nameAg
typeAg
agency
masterAg
F_ORDER
customerID (FK)
timeID (FK)
agentID (FK)
paymentDue
DeliveryDue
subsegmentID (FK)
quantity
discount
returns
cancellations
totalNetOrd
price
F_INVOICE
D_TIME
timeID
date
dayOfWeek
week
month
quartal
campSeason
year
campYear
customerID (FK)
timeID (FK)
agentID (FK)
subsegmentID (FK)
quantity
discount
price
D_PRODUCT
subsegmentID
subsegment
segment
subcategory
category
15
Conclusion
• We presented a set of complementary data warehouse
design models which may enable well integrated DW
solutions for relational DB implementations
• Models based on a common notation (IDEF1X) and in a
single design tool (ErWin)
• Our goal is to present students with a compact set of
modelling knowledge in the field of DB and DW
• Upgrade and further develop theoretical knowledge and
practical modelling skills through the educational process
16
Thank You for your
attention!