Professional Documents
Culture Documents
Provide a single version of the truth Improve decision making Support key corporate initiatives such as performance management, B2C and B2B e-commerce, and customer relationship management Estimated to be a $113.5 billion market in 2002 for systems, software, services, and in-house expenditures (Palo Alto Management Group)
A Simple Definition
A data warehouse is a collection of data created to support decisionmaking applications.
Subject oriented -- data are organized around sales, products, etc. Integrated -- data are integrated to provide a comprehensive view Time variant -- historical data are maintained Nonvolatile -- data are not updated by users
Another Definition
Data warehousing is the entire process of data extraction, transformation, and loading of data to the warehouse and the access of the data by end users and applications.
Data Mart
A data mart stores data for a limited number of subject areas, such as marketing and sales data. It is used to support specific applications. An independent data mart is created directly from source systems. A dependent data mart is populated from a data warehouse.
Data Sources
Transaction Data Prod
ETL Software
S T A G I N G A R E A O P E R A T I O N A L D A T A
Data Stores
Users
IBM
SQL
Mkt
IMS
Ascential
ANALYSTS
Cognos Teradata IBM Load Informatica Data Warehouse Data Marts Finance Essbase Marketing Meta Data Queries,Reporting, DSS/EIS, Data Mining EXECUTIVES Micro Strategy Sales Microsoft Siebel Business Objects Web Browser CUSTOMERS/ SUPPLIERS OPERATIONAL PERSONNEL SAS MANAGERS
HR
VSAM
Fin
Oracle
Extract
Acctg
Syba se
SAP
Sagent
Infor mix
SAS
External Data
Demographic
HarteHanks
S T O R E
The most common approach Begins with a single mart and architected marts are added over time for more subject areas Relatively inexpensive and easy to implement Can be used as a proof of concept for data warehousing Can perpetuate the silos of information problem Can postpone difficult decisions and activities Requires an overall integration plan
A comprehensive warehouse is built initially An initial dependent data mart is built using a subset of the data in the warehouse Additional data marts are built using subsets of the data in the warehouse Like all complex projects, it is expensive, time consuming, and prone to failure When successful, it results in an integrated, scalable warehouse
Primarily from legacy, operational systems Almost exclusively numerical data at the present time External data may be included, often purchased from third-party sources Technology exists for storing unstructured data and expect this to become more important over time
Data Extraction
Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data) Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of dirty data in the source systems) Increasing performed by specialized ETL software
Dummy Values Absence of Data Multipurpose Fields Cryptic Data Contradicting Data Inappropriate Use of Address Lines Violation of Business Rules Reused Primary Keys, Non-Unique Identifiers Data Integration Problems
Data Cleansing
Source systems contain dirty data that must be cleansed ETL software contains rudimentary data cleansing capabilities Specialized data cleansing software is often used. Important for performing name and address correction and householding functions Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
Data Staging
Often used as an interim step between data extraction and later steps Accumulates data from asynchronous sources using native interfaces, flat files, FTP sessions, or other processes At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse There is usually no end user access to the staging file An operational data store may be used for data staging
Data Transformation
Transforms the data in accordance with the business rules and standards that have been established Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates
Data Loading
Data are physically moved to the data warehouse The loading takes place within a load window The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications
Meta Data
Data about data Needed by both information technology personnel and users IT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.
Database Vendors
High end (i.e., terabyte plus) vendors include IBM (DB2) and NCR-Teradata (Teradata) Oracle (8i) and Microsoft (SQL Server 7) are major players for smaller databases
ROLAP
Relational OLAP Uses a RDBMS to implement and OLAP environment Typically involves a star schema to provide the multidimensional capabilities OLAP tool manipulates RDBMS star schema data Called slowlap by MOLAP vendors
MOLAP
Multidimensional OLAP Uses a MDDBS (e.g., Essbase) to store and access data Usually requires proprietary (non SQL) data access tools Provides exceptionally fast response times
Star Schema
Creates non-normalized data structures Easier for users to understand Optimized for OLAP Uses fact (facts or measures in the business) and dimension (establishes the context of the facts) tables
OLAP Tools
Products come from vendors such as Brio, Cognos, Hyperion, and BusinessObjects Typically available as a fat or thin (i.e., browser) client In a web environment, the browser communicates with a web server, which talks to an application server, which connects to backend databases The application server provides query, reporting, and OLAP analysis functionality over the web Java applets or downloaded components augment the thin client A broadcast server may be used to schedule, run, publish, and broadcast reports, alerts, and responses over the LAN, email, or personal digital assistant.
Star Schema
Patient
#Patient ID Patient Name Address Age Sex Insurance ID
Physician
#Physician ID Physician Name Specialty ID Credential ID
Service
#Service Code Service Description #Category Code
Claim Payer
#Payer ID Name Address Phone Number EDI Number # Physician ID # Patient ID # Service Code # Payer ID # Claim Number # Line Item Number # Claim Date Date of Services Amount of Charge Unit of Services
Time Periods
#Claim Date Year Month Quarter Week
Retail -- store name, zip code, product name, product category, day of week Telecommunications -- call origin, call destination Banking -- customer name, account number, branch, account officer Insurance -- policy type, insured party
Warehouse Users
Analysts Managers Executives Operational personnel Customers and suppliers
SQL queries Managed query environments Structured and ad hoc reports DSS/EIS Portals Data mining Packaged applications Custom-built applications
Owens&Minor -- data warehousing has supported integration along the supply chain. Winner of the 1999 TDWI Leadership Award the nation's leading distributor of name-brand medical and surgical supplies has transformed its business model by integrating supply chain management, ebusiness, data warehousing, and Internet technologies as part of this initiative, WISDOM (WebIntelligence Supporting Decisions from Owens & Minor) has been especially valuable
PRODUCT
Raw Materials Suppliers Manufacturer Owens & Minor Provider Patient
INFORMATION
+ 1,400 manufacturers + 4,000 Acute Care Facilities
WISDOM
a Web-based decision support system that provides information to OMs employees, suppliers and customers accesses data from a data warehouse that maintains supplier and customer transaction data sold to trading partners as a value added product WISDOM II provides data about the transactions that suppliers and customers have with all of their trading partners
Sample Applications
Supports reporting and queries for internal personnel Supports an EIS for senior management Suppliers can determine their market share in specific hospitals Hospitals can identify which products are being bought off contract WISDOM II extends data warehousing to trading partners through an outsourcing arrangement
Questions