Professional Documents
Culture Documents
Agenda
Dimension Design Cube Design Partitioning Aggregations Scale-out
Attribute Relationships
1 M relationships between attributes Server simply works better Examples
City State, State Country Day Month, Month Quarter, Quarter Year Product Subcategory Product Category
Rigid
Customer BirthDate City State
Attribute Relationships
Illustration
Country
State
City
Gender
Marital
Age
Customer
Attribute Relationships
Where are they used
Storage
Query performance
Greatly improved effectiveness of in-memory caching Materialized hierarchies when present
Processing performance: Fewer, smaller hash tables result in faster, less memory intensive processing Aggregation design: Algorithm needs relationships in order to design effective aggregations Member properties: Attribute relationships identify member properties on levels
Natural Hierarchies
1:M relation (via attribute relationships) between every pair of adjacent levels Examples
Country-State-City-Customer (natural) Country-City (natural) Age-Gender-Customer (unnatural) Year-Quarter-Month (depends on key columns)
How many quarters and months? 4 & 12 across all years (unnatural) 4 & 12 for each year (natural)
6
Natural Hierarchies
Performance implications
Only natural hierarchies are materialized on disk during processing Unnatural hierarchies are built on the fly during queries (and cached in memory) Server internally decomposes unnatural hierarchies into natural components
Essentially operates like ad hoc navigation path (but somewhat better)
Large Dimensions
Optimizing Processing
Use natural hierarchies
Good attribute/hierarchy relationships forces the AS engine to build smaller DISTINCT queries versus one large and expensive query Consider size of other properties/attributes
Dimension SQL queries are in the form of select distinct Key1, Key2, Name, , RelKey1, RelKey2, from [DimensionTable]
Dimension Processing
Byattribute vs. Bytable
ProcessingGroup property on each Dimension
By default it is set to ByAttribute This is usually the best choice
What is ByTable?
Sends one SQL query per dimension table Can result in better performance in limited cases Memory requirements and resource cost much higher for AS!
Example
Two dimensions each with >25M members ByTable
Took 80% of available memory (25.6GB out of 32GB)
ByAttribute
Only took 9GB out of 32GB
Agenda
Dimension Design Cube Design Partitioning Aggregations Scale-out
10
Cube Dimensions
Dimensions
Consolidate multiple hierarchies into single dimension (unless they are related via fact table) Use role-playing dimensions (e.g., OrderDate, BillDate, ShipDate)avoids multiple physical copies Use parent-child dimensions prudently
No aggregation support
Measure Groups
Common questions
At what point do you split from a single cube and create one or more additional cubes? How many is too many?
Guidance
Look at increase in dimensionality. If significant, and overlap with other measure groups is minimal, consider a separate cube Will users want to analyze measures together? Will calculations need to reference unified measures collection?
12
Dismissible:
By instance or globally Can specify comment in each case
13
Agenda
Dimension Design Cube Design Partitioning Aggregations Scale-out
14
Partitioning
Mechanism to break up large cubes into manageable chunks Measure groups can be partitioned, but not dimensions Measure group can have one or more partitions Fact rows are distributed across partitions as per partitioning scheme Partitioning scheme is managed by DBA (server is unaware) Examples
By Time: Sales for 2001, 2002, 2003, By Geography: Sales for North America, Europe, Asia,
15
Benefits Of Partitioning
Partitions can be added, processed, deleted independently
Update to last months data does not affect prior months partitions Sliding window scenario easy to implement e.g., 24 month window add June 2006 partition and delete June 2004
Benefits Of Partitioning
Partitions can be processed and queried in parallel
Better utilization of server resources Reduced data warehouse load times
18
19
Partition Slices
Defining partition slices allows server to avoid scanning partitions
E.g. Partition { [January 2008] } not scanned for query to [February 2008]
What if I get the error The slice specified for attribute_name attribute is incorrect"
MOLAP processing validates partition data against the defined slice Error raised if data received from source does not match slice specified E.g. Slice = [January] but partition query incorrectly defined to be SELECT FROM facts WHERE Month = February
20
Agenda
Dimension Design Cube Design Partitioning Aggregations Scale-out
21
What Is An Aggregation?
A subtotal of partition data based on a set of attributes from each dimension
Highest-Level Aggregation
Customer All Product All Units Sold 347814123 Sales $345,212,301.30
Customers
All Customers Country State City Name
Products
All Products Category Brand Item SKU
Intermediate Aggregation
countryCode Can US productID sd452 yu678 Units Sold 9456 4623 Sales $23,914.30 $57,931.45
Facts
custID 345-23 563-01 SKU 135123 451236 Units Sold 2 34 Sales $45.67 $67.32
22
(All, All, All) 1 (Country, Item, Quarter) 274,356 (Country, Item, Quarter) 274,356 (Country, Item, Quarter) 274,356 (Name, SKU, Day) 34,264,872,495 (Name, SKU, Day) 34,264,872,495
Do not build aggregations larger than 30% of fact table size (aggregation design algorithm doesnt)
24
25
Agenda
Dimension Design Cube Design Partitioning Aggregations Scale-out
26
AS 2008 Solution
Analysis Server
Analysis Server
SAN storage
.. .
Analysis Server
27
Processing Server
Query Servers
Attach/Detach
Ability to attach/detach a database Attach from any location Attach as read-only or read-write Multiple instances can attach as read-only (shared) Only one instance can attach as read-write (exclusive)
29
30