You are on page 1of 36

OLTP

Online Transaction Processing System (OLTP)



The online operational Database System that performs
online transaction and query processing is called
On Line transaction Processing (OLTP) systems. Ex. Day
to day operations of organizations, such as purchasing,
inventory, manufacturing, banking, payroll registration, and
accounting.
OLTP System deals with operational data. Operational data
are those data involved in the operation of a particular
system.
Example: In a banking System, you withdraw amount from
your account. Then Account Number, Withdrawal amount,
Available Amount, Balance Amount, Transaction Number etc
are operational data elements.

OLTP
In an OLTP system data are frequently updated and
queried. So quick response to a request is highly expected.
Since the OLTP systems involves large number of update
queries, the database tables are optimized for write
operations.
To prevent data redundancy and to prevent update
anomalies the database tables are normalized.
Normalization makes the write operation in the database
tables more efficient.
Operational data are usually of local relevance. It involves
queries accessing individual tuple(individual record).These
type of queries are termed as point queries.

Examples for OLTP Queries:

What is the Salary of Mr. John?
Withdraw Money from Bank Account : It
performs update operation if money is
withdrawn from account.
What is the address and email id of the person
who is the head of maths department?

What is OLAP
Basic idea: converting data into information
that decision makers need

Concept to analyze data by multiple
dimension in a structure called data cube
OLAP
OLAP designates a category of applications
and technologies that allows the collection,
storage, manipulation and reproduction of
multidimensional data, with the goal of
analysis.
History
In 1993, E. F. Codd came up with the term
online analytical processing (OLAP) in his
paper title Providing on-line analytical
processing using user analysts
the term OLAP seems perfect to describe
databases designed to facilitate decision
making (analysis) in an organization
Purpose of OLAP
To derive summarized information from large
volume database
To generate automated reports for human
view
Examples for OLAP Queries

How is the profit changing over the years
across different regions ?
Is it financially viable to continue the
production unit at location X?

OLAP, by Dr. Khalil 9
What and Why OLAP?
OLAP enables users to gain a deeper understanding and
knowledge about various aspects of their corporate
data through fast, consistent, interactive access to a
variety of possible views of data.
While OLAP systems can easily answer who? and
what? questions, its ability is to answer what if? and
why? type questions that distinguishes them from
general-purpose query tools.
The types of analysis available from OLAP range from
basic navigation and browsing (referred to as slicing
and dicing) , to calculations, to more complex analysis
such as time series and complex modeling.

OLAP, by Dr. Khalil 10
OLAP Applications
Finance: Budgeting, activity-based costing,
financial performance analysis, and financial
modeling.

Sales: Sales analysis and sales forecasting.

Marketing: Market research analysis, sales
forecasting, promotions analysis, customer
analysis, and market/customer segmentation.

Manufacturing: Production planning and defect
analysis.
OLAP, by Dr. Khalil 11
OLAP Benefits
Increased productivity of business end-users, IT
developers, and consequently the entire organization.
Reduced backlog of applications development for IT
staff by making end-users self-sufficient enough to
make their own schema changes and build their own
models.
Retention of organizational control over the integrity of
corporate data as OLAP applications are dependent on
data warehouses and OLTP systems to refresh their
source data level.
Improved potential revenue and profitability by
enabling the organization to respond more quickly to
market demands.
OLTP System
Online Transaction Processing
(Operational System)
OLAP System
Online Analytical Processing
(Data Warehouse)
Source of data
Operational data; OLTPs are the original
source of the data.
Consolidation data; OLAP data comes
from the various OLTP Databases
Purpose of data
To control and run fundamental business
tasks
To help with planning, problem solving,
and decision support
What the data
Reveals a snapshot of ongoing business
processes
Multi-dimensional views of various kinds
of business activities
Inserts and Updates
Short and fast inserts and updates
initiated by end users
Periodic long-running batch jobs refresh
the data
Queries
Relatively standardized and simple
queries Returning relatively few records
Often complex queries involving
aggregations
Processing Speed Typically very fast
Depends on the amount of data involved;
batch data refreshes and complex
queries may take many hours; query
speed can be improved by creating
indexes
Space Requirements
Can be relatively small if historical data is
archived
Larger due to the existence of
aggregation structures and history data;
requires more indexes than OLTP
Database Design Highly normalized with many tables
Typically de-normalized with fewer tables;
use of star and/or snowflake schemas
Backup and Recovery
Backup religiously; operational data is
critical to run the business, data loss is
likely to entail significant monetary loss
and legal liability
Instead of regular backups, some
environments may consider simply
reloading the OLTP data as a recovery
method
Schema
Pronounce skee-ma, the structure of a database
system, described in a formal language supported
by the database management system (DBMS). In
a relational database, the schema defines the
tables, the fields in each table, and the
relationships between fields and tables.
Schemas are generally stored in a data dictionary.
Although a schema is defined in text database
language, the term is often used to refer to a
graphical depiction of the database structure.

Types of Schemas
In database:-
Hierarchical model
Network model
Relational model (RDBMS)
In data warehouse
Star schema
Snow-flake schema

Star schema
The star schema architecture is the simplest data
warehouse schema. It is called a star schema
because the diagram resembles a star, with
points radiating from a center. The center of the
star consists of fact table and the points of the
star are the dimension tables. Usually the fact
tables in a star schema are in third normal
form(3NF) whereas dimensional tables are de-
normalized. Despite the fact that the star schema
is the simplest architecture, it is most commonly
used nowadays and is recommended by Oracle.
Star Schema
Star Schema
Fact Tables

A fact table typically has two types of
columns: foreign keys to dimension tables and
measures those that contain raw numeric
items that represent relevant business facts. A
fact table can contain fact's data on detail or
aggregated level, so it tends to be very large.
Star Schema
Dimension Tables
A dimension table is a structure usually composed
of one or more hierarchies that categorizes data. If
a dimension hasn't got a hierarchies and levels it is
called flat dimension or list. These tables are joined
to the fact table using foreign key references.
Dimension tables are generally small in size then
fact table.

Typical fact tables store data about sales while
dimension tables data about geographic
region(markets, cities) , customers, products, time.
Characteristics of star schema:

Simple structure -> easy to understand schema
Great query effectives -> small number of tables
to join
Relatively long time of loading data into
dimension tables -> de-normalized
The most commonly used in the data warehouse
implementations -> widely supported by a large
number of business intelligence tools
Snowflake schema
It is a logical arrangement of tables in a
multidimensional database such that the entity
relationship diagram resembles a snowflake
shape. The snowflake schema is represented by
centralized fact tables which are connected to
multiple dimensions. "Snowflaking" is a method
of normalising the dimension tables in a star
schema. When it is completely normalised along
all the dimension tables, the resultant structure
resembles a snowflake with the fact table in the
middle. The principle behind snowflaking is
normalisation of the dimension tables.
Snow-flake schema
Snow-flake Schema Star Schema
Ease of maintenance / change
No redundancy and hence more
easy to maintain and change
Has redundant data and hence
less easy to maintain/change
Ease of Use
More complex queries and hence
less easy to understand
Less complex queries and easy to
understand
Query Performance
More foreign keys-and hence
more query execution time
Less no. of foreign keys and
hence lesser query execution
time
Type of Datawarehouse
Good to use for datawarehouse
core to simplify complex
relationships (many:many)
Good for datamarts with simple
relationships (1:1 or 1:many)
Joins Higher number of Joins Fewer Joins
Dimension table
It may have more than one
dimension table for each
dimension
Contains only single dimension
table for each dimension
When to use
When dimension table is
relatively big in size, snowflaking
is better as it reduces space.
When dimension table contains
less number of rows, we can go
for Star schema.
Normalization/ De-Normalization
Dimension Tables are in
Normalized form but Fact Table is
still in De-Normalized form
Both Dimension and Fact Tables
are in De-Normalized form
Data model Bottom up approach Top down approach
Cube
A cube is a multidimensional structure that contains
information for analytical purposes; the main constituents of
a cube are dimensions and measures. Dimensions define the
structure of the cube that you use to slice and dice over, and
measures provide aggregated numerical values of interest to
the end user. As a logical structure, a cube allows a client
application to retrieve values, of measures, as if they were
contained in cells in the cube; cells are defined for every
possible summarized value. A cell, in the cube, is defined by
the intersection of dimension members and contains the
aggregated values of the measures at that specific
intersection.
Benefit of Using Cubes
A cube provides a single place where all related data, for
analysis, is stored.

3-D Cube
dimensions = 3
Multi-dimensional cube: Fact table view:
sale prodId storeId date amt
p1 s1 1 12
p2 s1 1 11
p1 s3 1 50
p2 s2 1 8
p1 s1 2 44
p1 s2 2 4
day 2
s1 s2 s3
p1 44 4
p2
s1 s2 s3
p1 12 50
p2 11 8
day 1
Example
P
r
o
d
u
c
t

Time
M T W Th F S S
Juice
Milk
Coke
Cream
Soap
Bread
NY
SF
LA
10
34
56
32
12
56
56 units of bread sold in LA on M
Dimensions:
Time, Product, Store
Attributes:
Product (upc, price, )
Store

Hierarchies:
Product Brand
Day Week Quarter
Store Region Country
roll-up to week
roll-up to brand
roll-up to region
OLAP, by Dr. Khalil 26
Representation of Multi-Dimensional Data
OLAP database servers use multi-dimensional structures to store
data and relationships between data.
Multi-dimensional structures are best-visualized as cubes of
data, and cubes within cubes of data. Each side of a cube is a
dimension.
OLAP, by Dr. Khalil 27
Representation of Multi-Dimensional Data
Multi-dimensional databases are a compact and easy-to-understand way of
visualizing and manipulating data elements that have many inter-
relationships.
The cube can be expanded to include another dimension, for example, the
number of sales staff in each city.
The response time of a multi-dimensional query depends on how many cells
have to be added on-the-fly.
As the number of dimensions increases, the number of cubes cells increases
exponentially.

OLAP, by Dr. Khalil 28
Representation of Multi-Dimensional Data
Multi-dimensional OLAP supports common analytical
operations, such as:
Consolidation: involves the aggregation of data such
as roll-ups or complex expressions involving
interrelated data. For example, branch offices can be
rolled up to cities and rolled up to countries.
Drill-Down: is the reverse of consolidation and
involves displaying the detailed data that comprises
the consolidated data.
Slicing and dicing: refers to the ability to look at the
data from different viewpoints. Slicing and dicing is
often performed along a time axis in order to analyze
trends and find patterns.

Olap cube basics
Measures
Dimensions
Hierarchies
Levels
OLAP Inplementation
Multidimensional OLAP (MOLAP)
Relational OLAP (ROLAP)
Hybrid OLAP (HOLAP)
OLAP, by Dr. Khalil 31
Multi-dimensional OLAP (MOLAP)
MOLAP tools use specialized data structures and multi-dimensional database
management systems (MDDBMS) to organize, navigate, and analyze data.
To enhance query performance the data is typically aggregated and stored according to
predicted usage.
MOLAP data structures use array technology and efficient storage techniques that
minimize the disk space requirements through sparse data management.
The development issues associated with MOLAP:
Only a limited amount of data can be efficiently stored and analyzed.
Navigation and analysis of data are limited because the data is designed according
to previously determined requirements.
MOLAP products require a different set of skills and tools to build and maintain the
database.
OLAP, by Dr. Khalil 32
Relational OLAP (ROLAP)
ROLAP is the fastest-growing type of OLAP tools.
ROLAP supports RDBMS products through the use of a metadata layer, thus avoiding the
requirement to create a static multi-dimensional data structure.
This facilitates the creation of multiple multi-dimensional views of the two-dimensional relation.
To improve performance, some ROLAP products have enhanced SQL engines to support the
complexity of multi-dimensional analysis, while others recommend, or require, the use of highly
denormalized database designs such as the star schema.
The development issues associated with ROLAP technology:
Performance problems associated with the processing of complex queries that require
multiple passes through the relational data.
Development of middleware to facilitate the development of multi-dimensional applications.
Development of an option to create persistent multi-dimensional structures, together with
facilities o assist in the administration of these structures.
HOLAP
a hybrid of ROLAP and MOLAP
can be thought of as a virtual database
whereby the higher levels of the database are
implemented as MOLAP and the lower levels
of the database as ROLAP
OLAP, by Dr. Khalil 34
Hybrid OLAP (HOLAP)
HOLAP tools provide limited analysis capability, either directly against RDBMS
products, or by using an intermediate MOLAP server.
HOLAP tools deliver selected data directly from DBMS or via MOLAP server to
the desktop (or local server) in the form of data cube, where it is stored,
analyzed, and maintained locally is the fastest-growing type of OLAP tools.
The issues associated with HOLAP tools:
The architecture results in significant data redundancy and may cause
problems for networks that support many users.
Ability of each user to build a custom data cube may cause a lack of data
consistency among users.
Only a limited amount of data can be efficiently maintained.
MOLAP (Multidimensional
Online Analytical Processing)
ROLAP (Relational Online
Analytical Processing)
HOLAP (Hybrid Online
Analytical Processing)
The MOLAP storage mode
causes the aggregations of the
partition and a copy of its
source data to be stored in a
multidimensional structure in
Analysis Services when the
partition is processed.
The ROLAP storage mode
causes the aggregations of the
partition to be stored in
indexed views in the relational
database that was specified in
the partitions data source.
The HOLAP storage mode
combines attributes of both
MOLAP and ROLAP. Like
MOLAP, HOLAP causes the
aggregations of the partition
to be stored in a
multidimensional structure in
an SQL Server Analysis Services
instance.
This MOLAP structure is highly
optimized to maximize query
performance. The storage
location can be on the
computer where the partition
is defined or on another
computer running Analysis
Services. Because a copy of the
source data resides in the
multidimensional structure,
queries can be resolved
without accessing the
partitions source data.
Unlike the MOLAP storage
mode, ROLAP does not cause a
copy of the source data to be
stored in the Analysis Services
data folders. Instead, when
results cannot be derived from
the query cache, the indexed
views in the data source are
accessed to answer queries.
HOLAP does not cause a copy
of the source data to be
stored. For queries that access
only summary data in the
aggregations of a partition,
HOLAP is the equivalent of
MOLAP.
MOLAP (Multidimensional
Online Analytical
Processing)
ROLAP (Relational Online
Analytical Processing)
HOLAP (Hybrid Online
Analytical Processing)
Query response times can be
decreased substantially by
using aggregations. The data in
the partitions MOLAP
structure is only as current as
the most recent processing of
the partition.
Query response is generally
slower with ROLAP storage
than with the MOLAP or HOLAP
storage modes. Processing time
is also typically slower with
ROLAP. However, ROLAP
enables users to view data in
real time and can save storage
space when you are working
with large datasets that are
infrequently queried, such as
purely historical data.
Queries that access source
datafor example, if you want
to drill down to an atomic cube
cell for which there is no
aggregation datamust
retrieve data from the
relational database and will not
be as fast as they would be if
the source data were stored in
the MOLAP structure. With
HOLAP storage mode, users will
typically experience substantial
differences in query times
depending upon whether the
query can be resolved from
cache or aggregations versus
from the source data itself.

You might also like