You are on page 1of 44

DATA

WAREHOUSING

People Making Technology Wor


What is Datawarehouse?

► data warehouse is a single, integrated source of


decision support information formed by
collecting data from multiple sources, internal to
the organization as well as external, and
transforming and summarising this information
to enable improved decision making.

► data warehouse is designed for easy access by


users to large amounts of information, and data
access is typically supported by specialized
analytical tools and applications.
Evolution of DWH

Traditional approaches to computer system design during 1980’s

► Not optimized for analysis and reporting


► Company wide reporting couldn’t be supported from a
single system
► For developing reports often required writing specific
computer programs which was slow and expensive
Why should we consider Data Warehousing solutions ?

When users are requesting access to a large amount of

historical information for reporting purposes, you should

strongly consider a warehouse or mart. The user will benefit

when the information is organized in an efficient manner for

this type of access.


Def . Data Warehousing

 DWH is type of relational data base system specially


designed for query analysis processing rather than
transactional processing.

 The DWH systems are also called as Historical Db’s,


Read only Db’s, Integrated Db’s, Decision Supporting
System, Executive info System, Business Info System.
Characteristics of DWH

►Subject Oriented
►Non Volatile
►Integrated
►Time Variant
Subject Oriented

• Exampleforaninsurancecompany:

ApplicationsArea DataWarehouse
AutoandFirePolicy
AutoandFirePolicy
ProcessingSystems
Commercial and ProcessingSystems
Commercial and Policy
Customer Policy
LifeInsurance Customer
LifeInsurance
Systems
Systems

Data
Data

Claims
Claims
Losses Premium
Processing Losses Premium
Accountin g System Processing
Accountin g System
Billing System
Billing System
System
System

7
Integrated

• Dataisstoredonceinasingleintegratedlocation

(e.g. insurancecompany)

AutoPolicy
AutoPolicy
Processing DataWarehouse
Processing
System Database
System

Customer FirePolicy
FirePolicy
Processing
Processing
data
System
System
stored

inseveral Subject=Customer
FACTS, LIFE
FACTS, LIFE
Commercial, Accounting
databases Commercial, Accounting
Applications
Applications

8
Time - Variant

• Dataisstoredasaseriesofsnapshotsorviewswhichrecordhowitiscollectedacrosstime.

DataWarehouseData

Time Data

{
Key

 Dataistaggedwithsomeelement of time- creationdate, asof date, etc.

 Dataisavailableon-lineforlongperiodsof timefortrendanalysisand

forecasting. Forexample, fiveormoreyears

9
Non-Volatile

• Existingdatainthewarehouseisnotoverwrittenorupdated.

External

Sources

Production Data
D
Daattaa Warehouse
Databases
PP
ro
rd
ou
dc
utionn
ctio W
Waa
rreehh
oouu
ssee Database
AA
pp
plicataio
plic ns
tions EE
nn
vviriroonn
mmeenn
tt
• Load
• Update
• Read-Only
• Insert
• Delete
10
OLAP

OnlineAnalyticalprocessing

Amodelthroughwhichuserscansliceanddicedata.

ForAnalytical&reporting purpose.

Fasterresponse.
Example OLTP Model
OLAP Model
Differences………………..

DWH database (OLAP) OLTP database


Designed for analysis of business Designed for real time business
measures by category and operations.
attributes.
Optimized for bulk loads and large, Optimized for a common set of
complex, unpredictable queries transactions, usually adding or
that access many rows per table. retrieving a single row at a time
per table.
Loaded with consistent, valid data; Optimized for validation of incoming
requires no real time validation. data during transactions; uses
validation data tables.

Supports few concurrent users Supports thousands of concurrent


relative to OLTP. users.
OLAP Database (OLAP) OLTP Database
Multidimensional Database Normalized Data
Structures Structures
Index - Many Index - Few
Joins - Few Joins - Many
Aggregated Data - More Aggregate Data - Few
No. of users - Few No. of users - More
Periodic update of data Data Modification More

Huge volumes of data Small volumes of data


Transaction - Warehouse Process

“TransactionBasedProcess”

Day-to-day On-line, real Detailed Information

operations timeupdate. tooperational

systems.

BatchLoad
“WarehouseBasedProcess”

Decisionsupportfor Summarize&
Transform
managementuse. Refine

16
DWH Life Cycle

Business Analyst

Data Modular

ETL Developer

Report Developer

Testing
DWH Architecture

Three common architectures are:

► DWH Architecture (Basic)

► DWH Architecture (With a staging area)

► DWH Architecture (With a staging area and data marts)


DWH Architecture (Basic)
DWH Architecture (with a staging area)
DWH Architecture
(with a staging area and data marts)
Ralph Kimball

Bottom-upapproach

Datamartsarefirstcreatedtoprovidereportingandanalyticalcapabilitiesfor
specificbusinessprocesses

Datamartscontainatomicdataand,ifnecessary,summarizeddata.

Thesedatamartscaneventuallybeunionedtogethertocreateacomprehensive
datawarehouse.
Bill Inmon

Top-downapproach

Datawarehouseasacentralizedrepositoryfortheentireenterprise.

 Datawarehouseisdesignedusinganormalizedenterprisedatamodel.

"Atomic"data,thatis,dataatthelowestlevelofdetail,arestoredinthedata
warehouse.
Dimensions & Measures

Datawarehouseconsistsofdimensionsandmeasures

Dimensionsallowdataanalysisfromvariousperspectives.Product
dimensioncouldhelpyouseewhichproductsbringinthemost revenue.

Measuresarenumericrepresentationsofasetoffactsthathave
occurred. Examplesofmeasuresincludedollarsofsales.
Dimensional Data Modeling

To develop a OLAP Schema design a Data Modeler follows


dimensional modeling design aspect.
Dimensional modeling is a 3 stage process

► Conceptual modeling
► Logical Modeling
► Physical Modeling
Before start implementing the schema design a
Data modeler should understand the following
process

► Understand the clients Business requirements


► Understand the grain of fact
► Designing of the Dimension tables
► Designing of the Fact tables
Example of Dimensional Data Model (Star Schema Design)
Fact Table

► Contain numeric measures of the business


► Contains facts and connected to dimensions
► two types of columns
facts or measures
foreign keys to dimension tables
► A fact table might contain either detail level facts
or facts that have been aggregated
Steps in designing Fact Table

► Identify a business process for analysis(like


sales).
► Identify measures or facts (sales dollar).
► Identify dimensions for facts(product dimension,
location dimension, time dimension, organization
dimension).
► List the columns that describe each dimension.
(region name, branch name, region name).
► Determine the lowest level of summary in a fact
table(sales dollar).
Types of Facts (Measures)

► Additive - Measures that can be added across


all dimensions.

► Semi Additive - Measures that can be added


across few dimensions and not with others.

► Non Additive - Measures that cannot be


added across all dimensions.
In the example, sales fact table is connected to dimensions
location, product, time and organization. Measure "Sales
Dollar" in sales fact table can be added across all
dimensions independently or in a combined manner
which is explained below.
► Sales Dollar value for a particular product
► Sales Dollar value for a product in a location
► Sales Dollar value for a product in a year within a
location
► Sales Dollar value for a product in a year within a
Fact Granularity
► A fact table maintains a numerical info
► It is defined as the level at which fact info/- is
stored.
► The level is determined by dimensional table.
Year?
Quarter?
Month?
Week?
Day?
Dimension Tables
► Contain textual information that represents
attributes of the business
► Contain relatively static data
► Are joined to fact table through a foreign key
reference
ExampleofLocationDimension
► Are usually smaller than fact tables
LocationDimension
Country Name State Name County Name City Name Date Time Stamp
Location Dimension
Location Dimension Id

1 USA New York Shelby Manhattan 1/1/2005 11:23:31 AM

2 USA Florida Jefferson Panama City 1/1/2005 11:23:31 AM

3 USA California Montgomery San Hose 1/1/2005 11:23:31 AM

4 USA New Jersey Hudson Jersey City 1/1/2005 11:23:31 AM


Star Schema Design benefits

► Easy for users to understand

► Fast response to queries

► Support multi dimensional analysis

► Supported by many front end tools


Snowflake Schema Design

► Dimension table hierarchies are broken into


simpler tables
► In few organizations, they try to normalize the
dimension tables to save space
► Both Fact and Dimensional tables are Normalized
► Increases the number of joins and poor
performance in retrieval of data
► May become large and unmanageable
► Degrades query performance
Example of Snowflake Schema
Important aspects of Star Schema & Snow Flake Schema

► In a star schema every dimension will have a primary key.


► In a star schema, a dimension table will not have any parent
table.
► Whereas in a snow flake schema, a dimension table will have
one or more parent tables.
► Hierarchies for the dimensions are stored in the dimensional
table itself in star schema.
► Whereas hierarchies are broken into separate tables in snow
flake schema. These hierarchies helps to drill down the data
from topmost hierarchies to the lowermost hierarchies.
Data Acquisition
► It is the process of extracting the relevant
business info/- from the different source
systems transforming the data from one format
into an another format, integrating the data in
to homogeneous format and loading the data in
to a warehouse database.
► Data Extraction (E)
► Data Transformation (T)
► Data Loading (L)
Sample ETL Process Flow
ETL Process

The ETL Process having the following basic steps

► Is mapping the data between source systems and


target database

► Is cleansing of source data in staging area

► Is transforming cleansed source data and then


loading into the target system
► Source System
A database, application, file, or other storage facility from
which the data in a data warehouse is derived.
► Mapping
The definition of the relationship and data flow between
source and target objects.
► Staging Area
A place where data is processed before entering the
warehouse.
► Cleansing
The process of resolving inconsistencies and fixing the
anomalies in source data, typically as part of the ETL
process.
► Transformation
The process of manipulating data. Any manipulation beyond
copying is a transformation. Examples include cleansing,
aggregating, and integrating data from multiple sources.
► Transportation
The process of moving copied or transformed data from a
source to a data warehouse.
► Target System
A database, application, file, or other storage facility to which
the "transformed source data" is loaded in a data warehouse.
Thank You !!!

You might also like