download

External/Unstructured Data and
the Data Warehouse
External/Unstructured Data in the
Data Warehouse
•
Several issues relate to the use and storage of
external and unstructured data in the data
warehouse.
1. The frequency of availability
2. It is totally undiscipline
3. Its unpredictability
•
Many methods to capture and store unstructure
information such as:
1. Near-line Storage
2. Create two stores of unstructured data
Meta Data and External Data
• Meta data is vital because through it external data is registered,
accessed, and controlled in the data warehouse environment. The
importance of meta data is best understood by noting what it typically
encompasses:
–
–
–
–
–
–
–
–
–
–
–
Document ID
Date of entry into the warehouse
Description of the document
Source of the document
Date of source of the document
Classification of the document
Index words
Purge date
Physical location reference
Length of the document
Related references
Storing External/
Unstructured Data
• External data and unstructured data can actually be
stored in the data warehouse if it is convenient and
cost-effective to do so.
• To store external data and unstructured data
requires considerable resources
• By associating external data and the unstructured
data with a data warehouse, the external data and
the unstructured data become available for all
parts of the organization, such as finance,
marketing, accounting, sales, engineering and so
forth
Modeling and
External/Unstructured data
• What is the role of the data model and
external data. See below (figure 8.6)
Archiving External data
• Every piece of information – external or
otherwise – has a useful lifetime.
• Once past that lifetime, it is not economical
to keep the information. An essential part of
managing external data is deciding what the
useful lifetime of the data is.
Comparing Internal data to
external data
• One of the most useful things to do with
external data is to compare it to internal
data over a period of time. The comparison
allows management a unique perspective.
For instance, being able to contrast
immediate and personal activities against
global activities and trends allow an
executive to have insights that simply not
possible elsewhere.