Professional Documents
Culture Documents
WHY DATAWAREHOUSE?
Report :
Purpose:
Collection of Data
Final:
Improve Decision
OLTP
ONLINE TRANSACTION PROCESSING
CITY BANK
Account, Loans, Mutual, Insure
Capture Information:
-- Customer
-- Saving, Account
-- Insurance
Multi Dimensional
Analysis of data
-Reporting
Database
T1
Front End
Applications
-- Java, . Net
R1
T2
INSERT/UPDATE
/DELETE
Select
Statement
R2
T3
Transaction by Transaction
T4
ER Entity Relationship
OLTP- Online Transactional Processing
or Transactional Systems or
Operational Systems
R3
Balance
Account ID
Branch
Date
5,00,000
Office
IBM/Accenture/Dell
Insert/Delete/Update
1000
10,00,000
8000
Branch/ATM
Insert/Delete/Update
Product-Char
Informatica
Staging
SaleID-Decimal
Product-String
E
Sales
Chennai
Sale ID Numeric
Product-Varchar2
Data
Warehouse
insert
change
Data
Warehouse
Operational
delete
insert
load
replace
change
read only
access
SALES
LOANS
ACCOUNTS
HR
Operational
Data
Warehouse
Snapshot data
time horizon : 5-10 years
data warehouse stores historical data
E
T
L
S
T
A
G
I
N
G
A
R
E
A
DATA
WAREHOUSE
E
T
L
DATA
MARTS
O
L
A
P
S
E
R
V
E
R
(S)
OLAP
REPORTS
OLAP
REPORTS
DATA ACQUISITION
What is ETL?
ETL stands for Extract Transform & Load
Aggregation of data
Meta Data
Data about data
Needed by both information technology personnel and users
IT personnel need to know data sources and targets;
database, table and column names; refresh schedules; data
usage measures; etc.
Users need to know entity/attribute definitions; reports/query
tools available; report distribution information; help desk
contact information, etc.
Presentation
System
Staging
Area
Extract
Transform
Load
Relational
ERP
Mainframe
File
Data Transformation:
It is a process of cleaning the data and transforming the data into
A required business format.
The following data transformation activities take place in staging
Area.
Data Merging
Data Cleansing
Data Scrubbing
Data Aggregation
Data Scrubbing:
It is a process of deriving new data definitions using existing data.
Example: Concat (First Name+ Last Name), Sal
Amount=QTY*Price
Data Aggregation:
Its process of calculating the summaries for a group of records
Using aggregate functions.
Example : Average, Max, Min etc.
Data Marts
Data Marts
Operational system
Operational system
Staging Area
Warehouse
Data
Sales
Purchase
Inventory
Operational system
Data Marts
Data Sources
Staging Area
Warehouse
Operational system
Sales
Operational system
Purchase
Operational system
Inventory
Data
Dimension Table
FACT TABLE
Contain numerical metrics of the business
Can hold large volumes of data
Can grow quickly
Types of Schemas
1. Star Schema
2. Snow Flake Schema
3. Galaxy Schema
4. Fact Constellation Schema
STAR SCHEMA
STAR SCHEMA
SNOWFLAKE SCHEMA
SNOWFLAKE SCHEMA
GALAXY SCHEMA
GALAXY SCHEMA
CONFIRMED DIMENSIONS
An Dimension table which is shared across Data Marts or more
than 1 Fact table
Example
Calendar/Date/Time Dimension
Customer Dimension
Product Dimension
SURROGATE KEYS
It has no meaning, other than stating uniqueness for each
record stored in the dimension tables.
Will be used in all dimension tables.
It is a just an Sequence No.
Advantage of Surrogate keys include:
-- Control over data
-- Reduced fact table size
Avoid using the OLTP keys as data warehouse keys.
Empid sat/001/hyd/7924
Projid GE/comm/US/NJ/001
Empkey 1001
Projkey 3001
Emp Dim
Emp Key
Proj Dim
Emp id
Proj Key
Ename
Proj id
Proj name
Emp Key
Loc Key
Proj Key
Date Key
Loc Dim
Date Dim
Loc Key
Date Key
Loc id
Date id
Loc name
Month
year
Reduced space Fact table
Integer to Integer comparison
instead of string to string
Locid Hye/001/hitech/204
Lockey 2001
Dateid 20/11/2011
Datekey 4001