download

The Data Warehouse and Design
Summary
•
•
•
•
•
•
•
The design of the data warehouse begins with the data model
The primary concern of the data warehouse developer is managing volume
The data warehouse is fed data as it passes from the legacy operational
environment. Data goes through a complex process of conversion,
reformatting, and integration as it passes from the legacy operational
environment into the data warehouse environment
The data model exist at three levels – high level, mid level, and low level
The creation of a data warehouse record is triggered by an activity or an event
that has occurred in the operational environment
A profile record is a composite record made up of many different historical
activities.
The star join is a database design technique that is sometimes mistakenly
applied to the data warehouse environment
Beginning with Operational Data
• Three types of loads are made into the data
warehouse from the operational environment:
– Archival data
– Data currently contained in the operational environment
– Ongoing changes to the data warehouse environment
from the changes (updates)that have occurred in the
operational environment since the last refresh
Beginning with Operational Data
(cont’d)
•
Five common techniques are used to limit
the amount of operational data scanned
1.
2.
3.
4.
5.
Scan data that has been timestamped
Scan a ‘delta’ file
Scan a log file or an audit file
Modify application code
Rubbing a ‘before’ and an ‘after’ image of the
operational file together
Data/Process Model and the
Architected Environment
• The process model applies only to the operational
environment
• The data model applies to both the operational
environment and the data warehouse environment
• A process model typically consists of the following (in
whole or in part)
–
–
–
–
–
–
–
Functional decomposition
Context-level zero diagram
Data Flow Diagram
Structure Chart
State Transition Diagram
HIPO chart
Pseudocode
The Data Warehouse and Data
Models
The Data Warehouse data model
• There are three levels of data modeling
– High-level modeling (ERD)
– Middle level modelling (DIS=Data Item Set)
– Low-level modeling (physical model)
Snapshots in the Data Warehouse
• Snapshots are created as a result of some event
occuring.
• The snapshot triggered by an event has four basic
components:
–
–
–
–
A key
A unit of time
Primary data that relates only to the key
Secondary data captured as part of the snapshot process
that has no direct relationship to the primary data or key
Complexity of Transformation
and Integration
• At first glance, when data is moved from
the legacy environment to the data
warehouse environment, it appears that
nothing more is going on than simple
extraction of data from one place to the next
Complexity of Transformation
and Integration (cont’d)
• Some lists of functionality required as data passes from the
operational, legacy environment to the data warehouse
environment
– The extraction of data from operational environment to the data
warehouse environment require a change in technology (DBMS
technology)
– The selection data may be very complex
– Operational input keys need to be restructured and converted
– Nonkey data is reformatted
– Data is cleansed
– Multiple input sources of data exist and must be merged
– Key resolution must be done
– Input files need resequencedd
– Default values must be supplied,
– Many etc…
Profile records
• Profile records represent snapshots of data,
just like individual activity records
• A profile record is created from the
grouping of many detailed records