download

Matakuliah : M0584 - Data Warehouse
Tahun
: Sep - 2009
Migration to the
Architected Environment
Pertemuan 10 - 11
A migration plan
• The beginning point for the migration plan is a
corporate data model. This model represents the
information needs of the corporation.
• The corporate data model may be built internally, or it
may have been generated from a generic data model
Bina Nusantara
A migration plan (cont’d)
• The corporate data model needs to identify (
at minimum!) the following:
–
–
–
–
Major subjects of the corporation
Definition of the major subjects of the corporation
Relationships between the major subjects
Groupings of keys and attributes that more fully
represent the major subjects, including the
following:
• Attributes of the major subjects
• Keys of the major subjects
• Repeating groups of keys and attributes
– Connections between major subject areas
– Subtyping relationships
A migration plan (cont’d)
• As a rule, the corporate data model identifies corporate
information at a high level. From the corporate data
model a lower-level model is built. The lower-level model
identifies details that have been glossed over by the
corporate data model.
• This mid-level model is built from the subject area at a
time. It is not built on an all-at-once basis because such
doing so takes so long
A migration plan (cont’d)
• Some reasons for excluding derived data and DSS data
from the corporate data
– Derived data and DSS data change frequently
– These forms of data are created from atomic data
– They frequently are deleted altogether
– There are many variations in the creation of derived
data and DSS data
A migration plan (cont’d)
Defining the System of record
• The system of record is defined in terms of the
corporation’s existing system
• The system of record is nothing more than the
identification of the ‘best’ data the corporation has that
resides in the legacy operational or in the web-based ebusiness environment
• The data model is used as a benchmark for determining
what the best data is.
A migration plan (cont’d)
Defining the System of record
• The ‘best’ source of existing data or data found
in the web-based e-business environment is
determined by the following criteria:
• What data in the existing systems or web-based
e-business environment:
–
–
–
–
–
Is the most complete
Is the most timely
Is the most accurate
Is the closest to the source of entry
Is the most closely to the structure of the data model?
In terms of keys? In terms of attributes? In terms of
groupings of data attributes
A migration plan (cont’d)
Defining the System of record
• A short list of technological changes includes the
following:
– A change in DBMS
– A change in operating systems
– The need to merge data from different DBMSs and
operating systems
– The capture of the web-based data in the web logs
– A change in basic data formats
A migration plan (cont’d)
Defining the System of record
• The next step: Design the data warehouse
A migration plan (cont’d)
Defining the System of record
• Principally, the following needs to be done:
– An element of time needs to be added
– All purely operational data needs to be eliminated
– Referential Integrity relationships need to be turned
into artifacts
– Derived data that is frequently needed is added to the
design
A migration plan (cont’d)
Defining the System of record
• The data warehouse, once designed, is
organized by subject area. Typical subject areas
are as follows:
–
–
–
–
–
–
Customer
Product
Sale
Account
Activity
Shipment
A migration plan (cont’d)
Defining the System of record
• The next step: Design and build the interfaces between
the system of record-in the operational environment and
the data warehouse
• The interfaces populate the data warehouse on a regular
basis.
A migration plan (cont’d)
Defining the System of record
• A word of caution: If you wait for existing systems to be
cleaned up, you will never build a data warehouse.
• One observation worthwhile at this point relates to the
frequency of refreshment of data into the data
warehouse.
• As a rule, data warehouse data should be refreshed no
more frequently than every 24 hours.
Strategic Considerations
Methodology and Migration
A data-driven development methodology
• Why have methodologies been disappointing?
– Methodologies generally show a flat, linear flow of activities
– Methodologies usually show activities as occurring once and
only once.
– Methodologies usually describe a prescribed set of activities to
be done
– Methodologies often tell how to do something, not what needs to
be done
– Methodologies often do not distinguish between the sizes of the
systems being develop under the methodology
– Methodologies often mix project management concerns with
design/development activities to be done
– And many more
Data-Driven Methodology
• A data-driven methodology does not take an
application-by-application approach to the
development of systems