Professional Documents
Culture Documents
WAREHOUSING
Module 4:
WHAT IS DW?
DW : storage area for processed and integrated data
across different sources (operational data & external data)
A data warehouse is a
Subject-oriented
Integrated,
Non-volatile,
Time-variant collection of
data in support of
management's decisions. William H. Inmon
3
SUBJECT ORIENTED
Example for an insurance company :
Data
Data
Claims
Claims Losses
Losses Premium
Premium
Accounting
Accounting Processing
Processing
System
System Billing System
System
Billing
System
System
INTEGRATED
Data is stored once in a single integrated location
(e.g. insurance company)
Auto
AutoPolicy
Policy
Processing Data Warehouse
Processing Database
System
System
Customer
Fire
FirePolicy
Policy
data Processing
stored Processing
System
System
in several
databases
Subject = Customer
FACTS,
FACTS,LIFE
LIFE
Commercial,
Commercial,Accounting
Accounting
Applications
Applications
TIME - VARIANT
Data is stored as a series of snapshots or views which
record how it is collected across time.
Data Warehouse Data
Time Data
Key
{
Data is tagged with some element of time - creation
date, as of date, etc.
Data is available on-line for long periods of time for
trend analysis and forecasting. For example, five or
more years
NON-VOLATILE
Existing data in the warehouse is not overwritten or updated.
External
Sources
Production Data
Databases Warehouse
Data Database
Production Data
Production Warehouse
Applications Warehouse
Applications Environment
Environment
Load
Update
Insert Read-Only
Delete
DATA WAREHOUSE DESIGN
The DW development can be done through 3 different
methodologies
Bottom-up design
Top down design &
Hybrid design
8
ARCHITECTURE
9
EXTRACT-TRANSFORM-
LOAD
ETL is a process in data warehousing responsible for
pulling data out of the source systems and placing it into
a data warehouse.
ETL involves the following tasks:
Extracting the data: from source systems (SAP, ERP,
other operational systems), data from different source
systems is converted into one consolidated data warehouse
format which is ready for transformation processing.
Transforming the data: may involve the following tasks:
Applying business rules (so-called derivations, e.g.,
10
calculating new measures and dimensions)
Cleaning (e.g., mapping NULL to 0 or "Male" to "M" and
"Female" to "F" etc.),
Filtering (e.g., selecting only certain columns to load),
Splitting a column into multiple columns and vice versa,
Joining together data from multiple sources (e.g., lookup,
merge),
Transposing rows and columns,
Applying any kind of simple or complex data
validation (e.g., if the first 3 columns in a row are
empty then reject the row from processing)
Departmentally
Structured
Organizationally
Data Data Warehouse 13
Structured
NEED FOR DATA
WAREHOUSING
Better business intelligence for end-users
Reduction in time to locate, access, and analyze
information
Consolidation of disparate information sources
Strategic advantage over competitors
Faster time-to-market for products and services
Replacement of older, less-responsive decision
support systems
Reduction in demand on IS to generate reports 14
ADVANTAGES &
LIMITATIONS
ADVANTAGES
Integrating of data from multiple sources
Performing new types of analyses
Reducing cost to access historical data
Improved decision support system
LIMITATIONS
Long initial implementation time and associated high cost
Adding new data sources takes time and associated high
cost
15