DW Architecture & DataFlow

Architecture of Data Warehouse
By: Er. Manu Bansal (Assistant Professor) Dept. of IT mrmanubansal@gmail.com
Data Warehouse- Concept
A data warehouse refers to a database that is maintained separately from an organizations operational databases. The construction of data warehouses involves data cleaning, data integration, and data transformation. Data warehousing also forms an essential step in the knowledge discovery process.
Data Warehouse V/S Data Base

The four keywords distinguishing data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems are:
Subject-oriented Integrated Time-variant Nonvolatile
Three Tired Architecture

other
Metadata
sources
Operational Extract Transform Load Refresh
Monitor & Integrator
OLAP Server
DBs
Data Warehouse
Serve
Analysis Query Reports Data mining
Data Marts
Data Sources
Data Storage
OLAP Engine Front-End Tools
Typical Components of a Data Warehouse Architecture
Operational data

Without source system, there would be no data The data sources for the data warehouse are supplied as follows:
Operational data held in network databases Departmental data held in file systems Private data held on workstations and private servers and external systems such as Internet, commercially available DB, or DB associated with and organizations suppliers or customers
Operational Data Store(ODS)
Is a repository of current and integrated operational data used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a staging area for data to be moved into the warehouse. ODS objectives: to integrate information from day-today systems and allow operational lookup to relieve day-to-day systems of reporting and current-data analysis demands. ODS can be helpful step towards building a data warehouse because ODS can supply data that has been already extracted from the source systems and cleaned.
Load Manager
Called the frontend component. The data is extracted from the operational systems directly or from the operational datastore and then to the data warehouse Performs all the operations associated with the extraction and loading of data into the warehouse. These operations include sourcing, acquisition, cleanup and transformation tools which prepare the data for entry into the warehouse. The functionality includes:
Removing unwanted data from operational databases. Converting to common data names and definitions. Calculating summaries. Establishing defaults for missing data.
Warehouse Manager
Performs all the operations associated with the management of the data in the warehouse as follows:

Analysis of data to ensure consistency Transformation and merging of source data from temporary storage into the data warehouse tables Creation of indexes and views. Backing-up and archiving data.
Data Warehouse Database

Central Repository for information. This database is almost always implemented on the relational database management system (RDBMS) technology. Certain data warehouse attributes such as very large database size, ad hoc query processing and need for flexible user view creation including aggregates, multi-table joins and drill downs have become drivers for different technology approaches to data warehouse database. These approaches include:
Data Warehouse Database- Contd.
Parallel Relational database designs that require a parallel computing platform, such as symmetric multiprocessors (SMPs) and massively parallel processors (MPPs). Multidimensional databases (MDDBs).
Query Manager
Called backend component Performs all the operations associated with the management of user queries Directing queries to the appropriate tables and scheduling the execution of queries.
Detailed Data
Stores all the detailed data in the database schema. On a regular basis, detailed data is added to the warehouse to supplement the aggregated data.
Lightly and Highly Summarized Data

Stores all the pre-defined lightly and highly aggregated data generated by the warehouse manager. The purpose of summary information is to speed up the performance of queries. On the other hand, it removes the requirement to continually perform summary operations (such as sort or group by) in answering user queries. The summarized data is updated continuously as new data is loaded into the warehouse.
Archive/Backup Data
Stores detailed and summarized data for the purposes of archiving and backup May be necessary to backup online summary data if this data is kept beyond the retention period for detailed data The data is transferred to storage archives such as magnetic tape or optical disk
Meta Data
This area of the warehouse stores all the metadata definitions used by all the processes in the warehouse Meta-Data is used for a variety of purposes:
Extraction and loading processes Warehouse management process
Used to automate the production of summary tables

Query management process
Used to direct a query to the most appropriate data source End-user access tools use metadata to understand how to build a query
End-user Access Tools

Users interact with the warehouse using end-user access tools. Can be categorized into five main groups
Data reporting and query tools (Query by Example MS Access DBMS) Application development tools (application used to access major DBS Oracle, sybase..) Executive information system (EIS) tools (For sales, marketing and finance) Online analytical processing (OLAP) tools (Allow users to analyze the data using complex and multidimentional views-from multiple databases) Data mining tools (allow the discovery of new patterns and trend by mining a large amount of data using statistical, mathematical tools)
Data Warehousing: Data flows
Inflow
The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse Cleaning include removing inconsistencies, adding missing fields, and cross-checking for data integrity Transformation include adding date/time stamp fields, summarizing detailed data, deriving new fields to store calculated data Extract the relevant data from multiple, heterogeneous, and external sources (commercial tools are used) Then mapped and loaded into the warehouse
Upflow
The process associated with adding value to the data in the warehouse through summarizing, packaging, and distribution of the data Summarizing the data works by choosing, projecting, joining, and grouping relational data into views that are more convenient and useful to the end users. Packeging the data involves converting the detailed or summarized information into more useful formats, such as spreadsheets, test documents, charts, other graphical presentations, private databases, and animation. Distribute the data in appropiate groups to increase its availability and accessibility
Downflow
The processes associated with archiving and backing-up of data in the warehouse. Archiving the effectiveness and performance maintanance is achieved by transferring the older data of limited value to storage archivers such as magnetic tapes, optical disk or digital storage devices. The downflow of data includes the processes to ensure that the current state of the data warehouse can be rebuilt following data loss, or software/hardware failures. Archived data should be stored in a way that allows the re-establishement of the data in the warehouse when required.
Outflow

Involves the process associated with making the data availabe to the end-users. This involves two activities such as data accessing and delivering Data accessing is concerned with satisfying the end userss requests for the data they need. The main problem here is the creation of an environment so that the users can effectively use the query tools to access the most appropiate data source. Delivering activity makes possible the information delivery to the users systems/workstations.
Metaflow
Meta-flow is a description of the data contents of the data warehouse, what is in it, where it came from originally, and what has been done to it by way of cleansing, integrating, and summarizing
Managing the metadata (data about the data)
Thanks

DW Architecture & DataFlow

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DW Architecture & DataFlow

Uploaded by

Copyright:

Available Formats

Architecture of Data Warehouse

By: Er. Manu Bansal (Assistant Professor) Dept. of IT mrmanubansal@gmail.com

Data Warehouse- Concept

Data Warehouse V/S Data Base

Subject-oriented Integrated Time-variant Nonvolatile

Three Tired Architecture

Monitor & Integrator

Analysis Query Reports Data mining

OLAP Engine Front-End Tools

Typical Components of a Data Warehouse Architecture

Operational Data Store(ODS)

Data Warehouse Database

Data Warehouse Database- Contd.

Lightly and Highly Summarized Data

Used to automate the production of summary tables

End-user Access Tools

Data Warehousing: Data flows

Managing the metadata (data about the data)

You might also like

DW Architecture &amp; DataFlow

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DW Architecture &amp; DataFlow

Uploaded by

Copyright:

Available Formats

Architecture of Data Warehouse

By: Er. Manu Bansal (Assistant Professor) Dept. of IT mrmanubansal@gmail.com

Data Warehouse- Concept

Data Warehouse V/S Data Base

Subject-oriented Integrated Time-variant Nonvolatile

Three Tired Architecture

Monitor & Integrator

Analysis Query Reports Data mining

OLAP Engine Front-End Tools

Typical Components of a Data Warehouse Architecture

Operational Data Store(ODS)

Data Warehouse Database

Data Warehouse Database- Contd.

Lightly and Highly Summarized Data

Used to automate the production of summary tables

End-user Access Tools

Data Warehousing: Data flows

Managing the metadata (data about the data)

You might also like

DW Architecture & DataFlow

DW Architecture & DataFlow