The online operational Database System that performs online transaction and query processing is called On Line transaction Processing (OLTP) systems. Ex. Day to day operations of organizations, such as purchasing, inventory, manufacturing, banking, payroll registration, and accounting. OLTP System deals with operational data. Operational data are those data involved in the operation of a particular system. Example: In a banking System, you withdraw amount from your account. Then Account Number, Withdrawal amount, Available Amount, Balance Amount, Transaction Number etc are operational data elements.
OLTP In an OLTP system data are frequently updated and queried. So quick response to a request is highly expected. Since the OLTP systems involves large number of update queries, the database tables are optimized for write operations. To prevent data redundancy and to prevent update anomalies the database tables are normalized. Normalization makes the write operation in the database tables more efficient. Operational data are usually of local relevance. It involves queries accessing individual tuple(individual record).These type of queries are termed as point queries.
Examples for OLTP Queries:
What is the Salary of Mr. John? Withdraw Money from Bank Account : It performs update operation if money is withdrawn from account. What is the address and email id of the person who is the head of maths department?
What is OLAP Basic idea: converting data into information that decision makers need
Concept to analyze data by multiple dimension in a structure called data cube OLAP OLAP designates a category of applications and technologies that allows the collection, storage, manipulation and reproduction of multidimensional data, with the goal of analysis. History In 1993, E. F. Codd came up with the term online analytical processing (OLAP) in his paper title Providing on-line analytical processing using user analysts the term OLAP seems perfect to describe databases designed to facilitate decision making (analysis) in an organization Purpose of OLAP To derive summarized information from large volume database To generate automated reports for human view Examples for OLAP Queries
How is the profit changing over the years across different regions ? Is it financially viable to continue the production unit at location X?
OLAP, by Dr. Khalil 9 What and Why OLAP? OLAP enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a variety of possible views of data. While OLAP systems can easily answer who? and what? questions, its ability is to answer what if? and why? type questions that distinguishes them from general-purpose query tools. The types of analysis available from OLAP range from basic navigation and browsing (referred to as slicing and dicing) , to calculations, to more complex analysis such as time series and complex modeling.
OLAP, by Dr. Khalil 10 OLAP Applications Finance: Budgeting, activity-based costing, financial performance analysis, and financial modeling.
Sales: Sales analysis and sales forecasting.
Marketing: Market research analysis, sales forecasting, promotions analysis, customer analysis, and market/customer segmentation.
Manufacturing: Production planning and defect analysis. OLAP, by Dr. Khalil 11 OLAP Benefits Increased productivity of business end-users, IT developers, and consequently the entire organization. Reduced backlog of applications development for IT staff by making end-users self-sufficient enough to make their own schema changes and build their own models. Retention of organizational control over the integrity of corporate data as OLAP applications are dependent on data warehouses and OLTP systems to refresh their source data level. Improved potential revenue and profitability by enabling the organization to respond more quickly to market demands. OLTP System Online Transaction Processing (Operational System) OLAP System Online Analytical Processing (Data Warehouse) Source of data Operational data; OLTPs are the original source of the data. Consolidation data; OLAP data comes from the various OLTP Databases Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities Inserts and Updates Short and fast inserts and updates initiated by end users Periodic long-running batch jobs refresh the data Queries Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations Processing Speed Typically very fast Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Space Requirements Can be relatively small if historical data is archived Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Database Design Highly normalized with many tables Typically de-normalized with fewer tables; use of star and/or snowflake schemas Backup and Recovery Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method Schema Pronounce skee-ma, the structure of a database system, described in a formal language supported by the database management system (DBMS). In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables. Schemas are generally stored in a data dictionary. Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure.
Types of Schemas In database:- Hierarchical model Network model Relational model (RDBMS) In data warehouse Star schema Snow-flake schema
Star schema The star schema architecture is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The center of the star consists of fact table and the points of the star are the dimension tables. Usually the fact tables in a star schema are in third normal form(3NF) whereas dimensional tables are de- normalized. Despite the fact that the star schema is the simplest architecture, it is most commonly used nowadays and is recommended by Oracle. Star Schema Star Schema Fact Tables
A fact table typically has two types of columns: foreign keys to dimension tables and measures those that contain raw numeric items that represent relevant business facts. A fact table can contain fact's data on detail or aggregated level, so it tends to be very large. Star Schema Dimension Tables A dimension table is a structure usually composed of one or more hierarchies that categorizes data. If a dimension hasn't got a hierarchies and levels it is called flat dimension or list. These tables are joined to the fact table using foreign key references. Dimension tables are generally small in size then fact table.
Typical fact tables store data about sales while dimension tables data about geographic region(markets, cities) , customers, products, time. Characteristics of star schema:
Simple structure -> easy to understand schema Great query effectives -> small number of tables to join Relatively long time of loading data into dimension tables -> de-normalized The most commonly used in the data warehouse implementations -> widely supported by a large number of business intelligence tools Snowflake schema It is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. "Snowflaking" is a method of normalising the dimension tables in a star schema. When it is completely normalised along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. The principle behind snowflaking is normalisation of the dimension tables. Snow-flake schema Snow-flake Schema Star Schema Ease of maintenance / change No redundancy and hence more easy to maintain and change Has redundant data and hence less easy to maintain/change Ease of Use More complex queries and hence less easy to understand Less complex queries and easy to understand Query Performance More foreign keys-and hence more query execution time Less no. of foreign keys and hence lesser query execution time Type of Datawarehouse Good to use for datawarehouse core to simplify complex relationships (many:many) Good for datamarts with simple relationships (1:1 or 1:many) Joins Higher number of Joins Fewer Joins Dimension table It may have more than one dimension table for each dimension Contains only single dimension table for each dimension When to use When dimension table is relatively big in size, snowflaking is better as it reduces space. When dimension table contains less number of rows, we can go for Star schema. Normalization/ De-Normalization Dimension Tables are in Normalized form but Fact Table is still in De-Normalized form Both Dimension and Fact Tables are in De-Normalized form Data model Bottom up approach Top down approach Cube A cube is a multidimensional structure that contains information for analytical purposes; the main constituents of a cube are dimensions and measures. Dimensions define the structure of the cube that you use to slice and dice over, and measures provide aggregated numerical values of interest to the end user. As a logical structure, a cube allows a client application to retrieve values, of measures, as if they were contained in cells in the cube; cells are defined for every possible summarized value. A cell, in the cube, is defined by the intersection of dimension members and contains the aggregated values of the measures at that specific intersection. Benefit of Using Cubes A cube provides a single place where all related data, for analysis, is stored.
3-D Cube dimensions = 3 Multi-dimensional cube: Fact table view: sale prodId storeId date amt p1 s1 1 12 p2 s1 1 11 p1 s3 1 50 p2 s2 1 8 p1 s1 2 44 p1 s2 2 4 day 2 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8 day 1 Example P r o d u c t
Time M T W Th F S S Juice Milk Coke Cream Soap Bread NY SF LA 10 34 56 32 12 56 56 units of bread sold in LA on M Dimensions: Time, Product, Store Attributes: Product (upc, price, ) Store
Hierarchies: Product Brand Day Week Quarter Store Region Country roll-up to week roll-up to brand roll-up to region OLAP, by Dr. Khalil 26 Representation of Multi-Dimensional Data OLAP database servers use multi-dimensional structures to store data and relationships between data. Multi-dimensional structures are best-visualized as cubes of data, and cubes within cubes of data. Each side of a cube is a dimension. OLAP, by Dr. Khalil 27 Representation of Multi-Dimensional Data Multi-dimensional databases are a compact and easy-to-understand way of visualizing and manipulating data elements that have many inter- relationships. The cube can be expanded to include another dimension, for example, the number of sales staff in each city. The response time of a multi-dimensional query depends on how many cells have to be added on-the-fly. As the number of dimensions increases, the number of cubes cells increases exponentially.
OLAP, by Dr. Khalil 28 Representation of Multi-Dimensional Data Multi-dimensional OLAP supports common analytical operations, such as: Consolidation: involves the aggregation of data such as roll-ups or complex expressions involving interrelated data. For example, branch offices can be rolled up to cities and rolled up to countries. Drill-Down: is the reverse of consolidation and involves displaying the detailed data that comprises the consolidated data. Slicing and dicing: refers to the ability to look at the data from different viewpoints. Slicing and dicing is often performed along a time axis in order to analyze trends and find patterns.
Olap cube basics Measures Dimensions Hierarchies Levels OLAP Inplementation Multidimensional OLAP (MOLAP) Relational OLAP (ROLAP) Hybrid OLAP (HOLAP) OLAP, by Dr. Khalil 31 Multi-dimensional OLAP (MOLAP) MOLAP tools use specialized data structures and multi-dimensional database management systems (MDDBMS) to organize, navigate, and analyze data. To enhance query performance the data is typically aggregated and stored according to predicted usage. MOLAP data structures use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management. The development issues associated with MOLAP: Only a limited amount of data can be efficiently stored and analyzed. Navigation and analysis of data are limited because the data is designed according to previously determined requirements. MOLAP products require a different set of skills and tools to build and maintain the database. OLAP, by Dr. Khalil 32 Relational OLAP (ROLAP) ROLAP is the fastest-growing type of OLAP tools. ROLAP supports RDBMS products through the use of a metadata layer, thus avoiding the requirement to create a static multi-dimensional data structure. This facilitates the creation of multiple multi-dimensional views of the two-dimensional relation. To improve performance, some ROLAP products have enhanced SQL engines to support the complexity of multi-dimensional analysis, while others recommend, or require, the use of highly denormalized database designs such as the star schema. The development issues associated with ROLAP technology: Performance problems associated with the processing of complex queries that require multiple passes through the relational data. Development of middleware to facilitate the development of multi-dimensional applications. Development of an option to create persistent multi-dimensional structures, together with facilities o assist in the administration of these structures. HOLAP a hybrid of ROLAP and MOLAP can be thought of as a virtual database whereby the higher levels of the database are implemented as MOLAP and the lower levels of the database as ROLAP OLAP, by Dr. Khalil 34 Hybrid OLAP (HOLAP) HOLAP tools provide limited analysis capability, either directly against RDBMS products, or by using an intermediate MOLAP server. HOLAP tools deliver selected data directly from DBMS or via MOLAP server to the desktop (or local server) in the form of data cube, where it is stored, analyzed, and maintained locally is the fastest-growing type of OLAP tools. The issues associated with HOLAP tools: The architecture results in significant data redundancy and may cause problems for networks that support many users. Ability of each user to build a custom data cube may cause a lack of data consistency among users. Only a limited amount of data can be efficiently maintained. MOLAP (Multidimensional Online Analytical Processing) ROLAP (Relational Online Analytical Processing) HOLAP (Hybrid Online Analytical Processing) The MOLAP storage mode causes the aggregations of the partition and a copy of its source data to be stored in a multidimensional structure in Analysis Services when the partition is processed. The ROLAP storage mode causes the aggregations of the partition to be stored in indexed views in the relational database that was specified in the partitions data source. The HOLAP storage mode combines attributes of both MOLAP and ROLAP. Like MOLAP, HOLAP causes the aggregations of the partition to be stored in a multidimensional structure in an SQL Server Analysis Services instance. This MOLAP structure is highly optimized to maximize query performance. The storage location can be on the computer where the partition is defined or on another computer running Analysis Services. Because a copy of the source data resides in the multidimensional structure, queries can be resolved without accessing the partitions source data. Unlike the MOLAP storage mode, ROLAP does not cause a copy of the source data to be stored in the Analysis Services data folders. Instead, when results cannot be derived from the query cache, the indexed views in the data source are accessed to answer queries. HOLAP does not cause a copy of the source data to be stored. For queries that access only summary data in the aggregations of a partition, HOLAP is the equivalent of MOLAP. MOLAP (Multidimensional Online Analytical Processing) ROLAP (Relational Online Analytical Processing) HOLAP (Hybrid Online Analytical Processing) Query response times can be decreased substantially by using aggregations. The data in the partitions MOLAP structure is only as current as the most recent processing of the partition. Query response is generally slower with ROLAP storage than with the MOLAP or HOLAP storage modes. Processing time is also typically slower with ROLAP. However, ROLAP enables users to view data in real time and can save storage space when you are working with large datasets that are infrequently queried, such as purely historical data. Queries that access source datafor example, if you want to drill down to an atomic cube cell for which there is no aggregation datamust retrieve data from the relational database and will not be as fast as they would be if the source data were stored in the MOLAP structure. With HOLAP storage mode, users will typically experience substantial differences in query times depending upon whether the query can be resolved from cache or aggregations versus from the source data itself.