Professional Documents
Culture Documents
Public
Introduction to DWH
Introduction, Course agenda and assumption
Discussions and Clarifications
What is Data warehouse?
Need of Data warehouse ( reporting aspects,
BI application types (ad hoc, standard reporting, analytic
applications,
dashboards) and audience
Different Approach of Data warehouse design
Advantages and disadvantages of different Data warehouse design
Layers OF DWH OLTP,OLAP,DATAMART,ODS,etc
Dimensional modeling Fundamentals
Basic characteristic of Fact and Dimension Table
Types of Dimension
Types of Facts
Public
What is a DWH:
A Data Warehouse is
Subject oriented
Integrated
Time Variant
Non-Volatile
Public
OLAP
Vs
OLTP
Public
DWH Architecture:
Public
DWH Architecture:
Different data warehousing systems have different structures.
Some may have an ODS, while some may have multiple Data
Marts.
In general, all data warehouse systems have the following layers:
Public
Staging Area:
Staging area is a place where you hold temporary tables on
data warehouse server.
We basically need staging area to hold the data, and perform
data cleansing and merging, before loading the data into
warehouse.
In the absence of a staging area, the data load will have to go
from the OLTP system to the OLAP system directly
Public
ODS:
An operational data store (ODS) is a type of database
that's often used as an interim logical area for a data
warehouse.
An Operational Data Store (ODS) is an integrated
database of operational data.
Its sources include legacy systems and it contains current
or near term data.
An ODS may contain 5-10 years of information, while a
data warehouse typically contains years of data.
ODS is specially designed such that it can quickly
perform relatively simply queries on smaller volumes of
data such as finding orders of a customer or looking for
available items in the retails store.
Public
Data Mart:
A data mart is the access layer of the DWH environment that is used
to get data out to the users.
The data mart is a subset of the data warehouse that is usually
oriented to a specific business line or team.
DWH
Data Mart
Concentrates on integrating
information from a given
subject area or set of source
systems
Public
Dimensional Modeling:
Dimensional modeling is the design concept used by many
data warehouse designers to build their data warehouse.
Dimensional model is the underlying data model used by
many of the commercial OLAP products available today in
the market.
In this model, all data is contained in two types of tables:
Dimension
Fact
Public
Dimensional Modeling:
Star Schema
A star schema model can be
depicted as a simple star: a
central table contains fact
data and multiple tables
radiate out from it, connected
by the primary and foreign
keys of the database.
Snowflake Schema
The snowflake schema
represents a dimensional
model which is also composed
of a central fact table and a
set of constituent dimension
tables which are further
normalized into sub-dimension
tables.
Public
Public
Dimension:
A dimension Table consists of the attributes about the facts.
Dimensions store the textual descriptions of the business.
The different types of dimensions are:
Conformed Dimension:
A dimension that has exactly the same meaning and content when being referred
to from different fact tables.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension is
simply a structure that provides a convenient place to store the junk attributes.
Degenerate Dimensions:
A degenerate dimension is when the dimension attribute is stored as part of fact
table, and not in a separate dimension table.
Role Playing Dimensions:
A role-playing dimension is one where the same dimension key along with its
associated attributes can be joined to more than one foreign key in the fact
table.
Public
Dimension:
Slowly Changing Dimensions:
Attributes of a dimension that would undergo changes over time.
It depends on the business requirement whether particular
attribute history of changes should be preserved in the data
warehouse.
Fact:
A fact table is the one which consists of the measurements, metrics or facts of
business process.
These measurable facts are used to know the business value and to forecast the future
business.
The different types of facts are:
Additive:
Additive facts are facts that can be summed up through all of the dimensions in the
fact table. A sales fact is a good example for additive fact.
Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions in the
fact table, but not the others. Eg: Daily balances fact can be summed up through the
customers dimension but not through the time dimension.
Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table. Eg: Facts which have percentages, ratios calculated.
Public
Public