Professional Documents
Culture Documents
Joachim Hammer
Integration System
World
Wide
Web
Digital Libraries
Scientific Databases
Personal
Databases
Metadata
Integration System
...
Wrapper
Wrapper
Wrapper
...
Source
Joachim Hammer
Source
Source
Disadvantages of Query-Driven
Approach
Delay in query processing
Slow or unavailable information sources
Complex filtering and integration
Joachim Hammer
Clients
Data
Warehouse
Integration System
Metadata
...
Extractor/
Monitor
Extractor/
Monitor
Extractor/
Monitor
...
Source
Joachim Hammer
Source
Source
5
Advantages of Warehousing
Approach
High query performance
But not necessarily most current information
Joachim Hammer
Joachim Hammer
What is a Warehouse?
Stored collection of diverse data
A solution to data integration problem
Single repository of information
Subject-oriented
Organized by subject, not by application
Used for analysis, data mining, etc.
Joachim Hammer
Updates infrequent
May be append-only
Examples
All transactions ever at WalMart
Complete client histories at insurance firm
Stockbroker financial information and portfolios
Joachim Hammer
10
Warehouse is a Specialized DB
Standard DB
Mostly updates
Many small transactions
Mb - Gb of data
Current snapshot
Index/hash on p.k.
Raw data
Thousands of users (e.g.,
clerical users)
Joachim Hammer
Warehouse
Mostly reads
Queries are long and complex
Gb - Tb of data
History
Lots of scans
Summarized, consol. data
Hundreds of users (e.g.,
decision-makers, analysts)
11
Joachim Hammer
12
Warehouse DBMSBuzzwords
Used primarily for decision support (DSS)
A.K.A. On-Line Analytical Processing (OLAP)
Complex queries, substantial aggregation
TPC-D benchmark
Joachim Hammer
13
Joachim Hammer
14
Joachim Hammer
15
Clients
Clients
Data
Warehouse
Integration System
Metadata
...
Extractor/
Monitor
Extractor/
Monitor
Extractor/
Monitor
...
Joachim Hammer
Source
Source
Source
16
Joachim Hammer
17
Joachim Hammer
Extraction
Integration
Warehousing specification
Optimizations
Miscellaneous
18
Data Extraction
Translation
Translate information to warehouse data model
Similar to wrapper in query-driven approach [Stanford
Tsimmis project]
Change detection
For on-line data propagation and incremental
warehouse refresh
Different classes of information sources
Joachim Hammer
Cooperative
Logged
Queryable
Snapshot
19
Implementing extractors
Specification-based or toolkit approach
Joachim Hammer
20
Data Integration
Warehouse data materialized view
Initial loading
View maintenance
Joachim Hammer
Sold (item,clerk,age)
Sold = Sale
Emp
Integrator
Sales
Sale(item,clerk)
Joachim Hammer
Comp.
Emp(clerk,age)
22
Sold (item,clerk,age)
Integrator
Sales
Sale(item,clerk)
Comp.
Emp(clerk,age)
23
Warehouse Self-Maintainability
Goal: No queries back to sources
Research issues:
What views are self-maintainable
Store auxiliary views so original + auxiliary views are
self-maintainable
Examples:
Keep keys
Inserts into Sale, maintain auxiliary view:
Emp - clerk,age(Sold)
Lots of issues...
Joachim Hammer
24
Warehouse Specification
Ideal scenario:
View definitions
Data
Warehouse
Warehousing
Configuration
Module
Integrator
Change
detection
requirements
Joachim Hammer
Extractor
Extractor
Info
Source
Info
Source
Metadata
Extractor
...
Info
Source
25
Optimizations
Update filtering at extractor
Multiple view optimization
If warehouse contains several views
Exploit shared sub-views
Others?
Joachim Hammer
26
Joachim Hammer
27