Professional Documents
Culture Documents
DBI/PAT/2017 2
Kimball Life Cycle
DBI/PAT/2017 3
Normalized vs. Dimensional
• There are two leading approaches to storing data in a data
warehouse:
1. The normalized approach (Inmon )
2. The dimensional approach (Kimball )
there are other approaches
DBI/PAT/2017 4
Normalized
• The data in the data warehouse are stored following, to a
degree, database normalization rules.
• Tables are grouped together by subject areas that reflect
general data categories (e.g., data on customers, products,
finance, etc.)
• The main advantage of this approach is that it is
straightforward to add information into the database.
• Disadvantage : because of the number of tables involved, it
can be difficult for users to [1]join data from different sources
into meaningful information and [2]access the information
without a precise understanding of the sources of data and
of the data structure of the data warehouse.
DBI/PAT/2017 5
Dimensional Modeling
• Dimensional modeling is a logical design technique for
structuring data so that it’s intuitive to business users and
delivers fast query performance
• Transaction data are partitioned into facts (numeric
transaction data), and dimensions (reference information that
gives context to the facts).
– sales transaction can be broken up into
• facts such as the number of products ordered and the price paid
for the products,
• dimensions such as order date, customer name, product number,
order ship-to and bill-to locations, and salesperson responsible for
receiving the order.
DBI/PAT/2017 6
Benefits of Dimensional Modeling
• Understandability: data warehouse is easier for the user to
understand and to use.
• Query performance: the retrieval of data from the data
warehouse tends to operate very quickly.
DBI/PAT/2017 7
Fact and Dimensions
• Dimensional modeling implies two distinct types of data:
1. Facts
2. Dimensions
• These data are stored two types of tables:
1. Fact tables
2.Dimension tables
DBI/PAT/2017 8
Fact and Dimension (2)
• Dimensional modeling implies two distinct types of data:
1. Facts
2. Dimensions
• These data are stored two types of tables:
1. Fact tables
2.Dimension tables
DBI/PAT/2017 9
Fact Table
DBI/PAT/2017 10
Dimension Table
DBI/PAT/2017 11
Fact -Dimension Data Model
DBI/PAT/2017 12
Fact -Dimension Data Model
Advantages:
-easy to understand
-better performance
-extensible
DBI/PAT/2017 13
Granularity
• Granularity refers to the level of detail stored in a table
• When identifying the grain, we must specify exactly what a fact
table record means.
• Each fact and dimension table is said to have its own grain or
granularity. In other words, each table (either fact or dimension)
will have some level of detail associated with it.
• The more detail there is, the lower the level of granularity. The
less detail there is, the higher the level of granularity
• The grain of the dimensional model is the finest level of detail
implied by the joining of the fact and dimension tables. For
example, the granularity of a dimensional model consisting of
dimensions, date (year, quarter, month, and day), store (region,
district, and store) and product (category name, brand, and
product) is product sold in store by day
DBI/PAT/2017 14
Conformed Dimensions
• Common, standardized, master dimensions
• Conformed Dimension is the dimension which has the same
meaning and content when being referred from different fact
tables. A conformed dimension can refer to multiple tables in
multiple data marts within the same organization. For example
: Time is a common conformed dimension because its
attributes (day, week, month, quarter, year, etc.) have the
same meaning when joined to any fact table.
• Conformed dimensions deliver consistent descriptive attributes
across dimensional models. They support the ability to drill
across and integrate data from multiple business processes.
• Reusing conformed dimensions shortens the time-to-market by
eliminating redundant design and development efforts.
DBI/PAT/2017 15
Conformed Facts
DBI/PAT/2017 16
Slowly Changing Dimensions
DBI/PAT/2017 17
Slowly Changing Dimensions (2)
Type 1 Type 2
Type 3
DBI/PAT/2017 18
Enterprise Data Warehouse Bus
Architecture
• Planning the construction of overall DW/BI environment is a
critical activity
• Building all at once is too daunting
• Building it as isolated pieces defeats the overall goals
What is to be done?
“start with a quick and sufficient effort that defines the
overall enterprise DW/BI data architecture”
-> the enterprise data warehouse bus matrix
DBI/PAT/2017 19
Enterprise Data Warehouse Bus
Architecture
The Enterprise Data Warehouse Bus Matrix
• The matrix delivers the big picture perspective, regardless of
database or technology preferences, while also identifying
reasonably manageable development efforts.
• Each business process implementation incrementally builds
out the overall architecture
• Multiple development teams can work on component of the
matrix fairly independently and asynchronously
DBI/PAT/2017 20
Enterprise Data Warehouse Bus
Architecture
• Enterprise Data Warehouse Bus allows for incremental data
warehouse and business intelligence (DW/BI) development. It
decomposes the DW/BI planning process into manageable
pieces by focusing on the organization’s core business
processes, along with the associated conformed dimensions.
DBI/PAT/2017 21
Four-Step Dimensional Design Process
DBI/PAT/2017 22
1. Select the business process
• A process is a natural business activity performed in the
organization
• Typically is supported by a source data collection system.
• Example business processes include:
raw materials purchasing,
orders,
shipments,
invoicing,
inventory,
general ledger
• It is not an organizational business department or function. For
example, we build a single dimensional model to handle orders
data rather than building separate models for the sales and
marketing departments, which both want to access orders data.
DBI/PAT/2017 23
2. Declare the grain
• Declaring the grain means specifying exactly what an individual
fact table row represents.
• The grain conveys the level of detail associated with the fact
table measurements.
• It provides the answer to the question, “How do you describe a
single row in the fact table?”
• Example grain declarations include:
An individual line item on a customer’s retail
sales ticket as measured by a scanner device
A line item on a bill received from a doctor
An individual boarding pass to get on a flight
A daily snapshot of the inventory levels for
each product in a warehouse
A monthly snapshot for each bank account
DBI/PAT/2017 24
2. Declare the grain (2)
• An inappropriate grain declaration will haunt a data warehouse
implementation.
• Declaring the grain is a critical step that can’t be taken lightly.
• If in steps 3 or 4 we see that the grain statement is wrong we
must return to step 2, redeclare the grain correctly, and revisit
steps 3 and 4 again.
DBI/PAT/2017 25
3. Choose the dimensions
• If we are clear about the grain, then the dimensions typically can
be identified quite easily.
• Represent all possible descriptions that take on single values in
the context of each measurement.
• Examples of common dimensions include:
date,
product,
customer,
transaction type,
status.
DBI/PAT/2017 26
4. Identify the facts
• Facts are determined by answering the question, “What are we
measuring?”
• All candidate facts in a design must be true to the grain defined
in step 2.
• Facts that clearly belong to a different grain must be in a separate
fact table.
• Typical facts are numeric additive figures such as quantity
ordered or dollar cost amount.
DBI/PAT/2017 27
Bibliography
1. Ralph Kimball, Margy Ross - The Data Warehouse
Toolkit, Second Edition, Wiley & Sons, 2007
DBI/PAT/2017 28