You are on page 1of 18

Data Warehousing

A data warehouse is a subject-oriented, integrated, time-variant, and nonupdatable collection of data in support of managements decision-making process.
Subject-Oriented
High level Entities like Customers, Patients, Students, Products and time. Data gathered from several internal system of records or from sources external to the organization.

Integrated

Time-Variant

Time dimension is used in Data Warehousing to study the trends and changes.

Nonupdatable

New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data.

Data warehouse can be more than one database

In Simple Words A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.

Problem: Heterogeneous Information Sources


Heterogeneities are everywhere
Personal Databases

Scientific Databases
Digital Libraries

World Wide Web

Different interfaces Different data representations Duplicate and inconsistent information


Combined research results from different bioinformatics repositories

Goal: Unified Access to Data

Integration System

World Wide Web


Digital Libraries Scientific Databases

Personal Databases

Collects and combines information Provides integrated view, uniform user interface Supports sharing

The Need for Data Warehousing


1. A business requires an integrated, companywide view of high quality information. 2. The information systems department must separate informational from operational systems( system of records) to improve performance dramatically in managing company data.

Why a Warehouse
For analysis and decision support, end users require access to data captured and stored in an organizations operational or production systems. This data is stored in multiple formats, on multiple platforms, in multiple data structures, with multiple names, and probably created using different business rules

Why should we consider Data Warehousing solutions ?


When users are requesting access to a large amount of historical information for reporting purposes, you should strongly consider a warehouse. The user will benefit when the information is organized in an efficient manner for this type of access.

An Example to look at the need of Data Warehousing

Data Warehouse Components


Data Warehouse Components
DB/2 VSAM

IMS

Mainframe Applications

M anage me nt Re porting Sale s/M arke ting Custome r Re lations Re se rv e Analysis Risk Analysis

DB2/2

PC Applications

???

Extract Programs Data Cleansers/Scrubbers Translators/Transformers Timing Tools Data Loading File Transfer

Reserv es

Customers

Rates

Combine d Data Ware house

Policies

External Sources
Claims Premiums

DB/6000

Midrange

De cision Support Tools

DB/400

Administration and Management Tools


a data warehouse requires tools to support the administration and management of such complex enviroment. for the various types of meta-data and the dayto-day operations of the data warehouse, the administration and management tools must be capable of supporting those tasks:
monitoring data loading from multiple sources data quality and integrity checks managing and updating meta-data monitoring database performance to ensure efficient query response times and resource utilization

auditing data warehouse usage to provide user chargeback information replicating, subsetting, and distributing data maintaining effient data storage management archiving and backing-up data implementing recovery following failure security management

In computers, the path of data from source document to data entry to processing to final reports. Data changes format and sequence (within a file) as it moves from program to program. Is known as Data flow

Data Flow
Inflow- The processes associated with the extraction,
cleansing, and loading of the data from the source systems into the data warehouse.

upflow- The process associated with adding value to the data


in the warehouse through summarizing, distribution of the data.

downflow- The processes associated with archiving and


backing-up of data in the warehouse.

outflow- The process associated with making the data


availabe to the end-users.

Meta-flow- The processes associated with the management


of the meta-data.

Architectures
Many database architectures has been implemented 2 architectures need to be quoted: 1. 2. OLTP (OnLine Transaction Processing) Data Warehouse (OLAP)(online analytical processing) OLTP is used to store data and query it frequently and is based on normalized schemas. Data warehouse is used to store data history and is based on fact tables and dimension tables.

Difference between OLTP and DataWare House


OLTP users function DB design data clerk, IT professional day to day operations application-oriented current, up-to-date detailed read/write index/hash on prim. key short, simple transaction tens thousands 100MB-GB OLAP knowledge worker decision support subject-oriented historical, summarized, multidimensional integrated lots of scans complex query millions hundreds 100GB-TB

access unit of work # records accessed #users DB size

Special Thanks to

Google.com
and other sites.

Thank You

You might also like