You are on page 1of 23

IBM Power Systems Technical University

October 1822, 2010 Las Vegas, NV

Session Title: IBM Smart Analytics System Session: s002


Speaker Name: Doug Mack

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Analytics are cross industry


What is our risk this morning? Are we using our stimulus funding effectively?

Which treatments are ineffective and should be eliminated to lower costs?

Do we have product issues or fraudulent claims from service?

Our prices are lower than others. Is this sustainable given our costs, or a future threat?

Fast Flexible Affordable

How & when should we adjust plans to reduce churn & expand share?

4000+ GBS Consultants with deep industry knowledge Industry Specific Warehouse Packs for quick starts
2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Unlocking the Business Value of Information

59% of managers miss information


they should have used

47% of users dont have confidence


in their information

42% of managers use wrong information


at least once a week AIIM & Accenture Surveys, 2007
2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Getting INFORMATON out of operational databases can be problematic


Databases designed for TRANSACTION processing Data elements designed by developers not meaningful to analysts Table/File relationships not well understood need an expert Transactions characterized by
Many users Short CPU time required per use Small number of records accessed/updated per transaction
T01045P DSFTCA DSRTBB DSRTTB DSMNTI DSVB1B DSVB2B DSYT1LO DSYT1LR DSRRWA DS6TYHA DSTIIPQ DSDRTF DSVBHA DSVBSS DSVBPE DSVBYI DSMNTI DSVR2B DSVR3B DSYT2WL DSYTWLT DSRRYUQ 3P 0 5A 5A 1A 1A 1A 50A 12A 5A 1A 3P 0 6P 0 1A 2A 3A 5P 2 25A 25A 25A 12A 12A 6A

Transaction Schema
T00032P DFFTCA DFRTBB DFRTTB DFMNTI DFTG1B DFTG2B DFTG3B DFTG4B DFMNEE DFMNEF DFRERP DFWELF DFWILF DFWILR DFWILS DFWILT DFQI1W DFQ2IW DFTRES DFYT1LL DFYT1LO DFYT1LR DFRRWA DF6TYHA DFTIIPQ DFDRTF DFDRTG DFDRTH DFTPPL DFTINM DFTIR2 DFTIGL DFTTDT DFTTED DFHHIJ DFHHIK DFTYHI DFTYIA DFTYKN DFTTWK DFTGHA DFTGSS DFTGPE DFTGYI 3P 0 5A 5A 1A 1A 1A 1A 1A 25A 11P 2 11P 2 11P 2 11P 2 11P 2 11P 2 11P 2 5A 3A 10A 45A 12A 12A 5A 1A 1P 0 6P 0 6P 0 6P 0 1P 0 3P 0 30A 12A 6P 0 6P 0 4P 2 4P 2 5P 2 1A 1A 1A 1A 2A 3A 5P 2 T01046P KSFTCA KSGSBB KSGDMB KSMARI KSYT3LA KSYT3LE KSRRWA KS6TYHA KSTIIPQ KSDGSF KSVYHA KSVFSS KSVGTE KSVUYI KSMPTI KSVR2B KSVR3B KSYTBEL KSYTPIT KSRQAU1 3P 0 5A 5A 1A 50A 6P 0 5A 1A 9P 0 6P 0 2A 2A 3P 0 5P 2 2A 2A 2A 10A 10A 5A

Analytical Queries characterized by


R02126P

Smaller number of users Intensive resource utilization (I/O) Large number of database records accessed Data loads and queries occuring at some time

Data may not be consolidated Dirty Data

AGFRCA AGAC3EE AGRRWA AG6RYHA AGR22PQ AGDGSF AGVYHA AGVFSS AGVGRE AGVUY2 AGMPR2 AGVR2B AGVR3B AGACBEE AGACP2R AGRQAU1 AGGSBB AGGDMB AGMAR2 AGAC3EA AG6TTHA AGRSAPQ AGHISF

3P 0 6P 0 5A 1A 9P 0 6P 0 14A 12A 3P 0 5P 2 2A 2A 2A 1A 10A 5A 1A 8A 1A 50A 1A 6P 0 6P 0

T03140P TLFTCA TLRTBB TLRTTB TLTNT3 TLKB1B TLKB2B TLTNT3 TLKR2B TLKR3B TLPT2WL TLPTWLT TLRRPUQ 3P 0 5A 5A 1A 1A 1A 25A 25A 25A 12A 12A 6A

T05001P FPPTWLT FPLLPUQ FPFTCA FPLTTB FPTNTP FPYB1B FPTNTP FPYL2B FPYLPB 12A 6A 1P 0 5A 1A 1A 25A 1P 0 25A

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

The Foundation for Analytics includes:


A Data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis. An expanded definition for data warehousing includes tools to extract, transform, and load (ETL) data into the repository, and tools to manage and retrieve metadata. Business Intelligence is the name for the ability to act on information that may come from a data warehouse. Data mining is a business process based on advanced technology, which finds unknown and complex relationships in data, producing insight into business issues and predictions to improve business decisions.

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

IBM Smart Analytics System Schematic


Cognos 8 BI

InfoSphere Warehouse

Data Warehouse

Cubing Services ETL


Data/Text Mining
Operational Source Systems Structured/ Unstructured Data

ETL Extract, Transform, and Load


2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Very easy to work with


INVOICE_LINES PRODUCTS PRODUCT_NUMBER PRODUCT_DESCRIPTION BRAND_CODE BRAND_DESCRIPTION ORIGIN_CODE ORIGIN_DESCRIPTION FAMILY_CODE FAMILY_DESCRIPTION COST BASE_PRICE PRODUCT_WEIGHT PRODUCT_VOLUME LOAD_DATE LAST_CHANGE_TIME STATUS_FLAG 5P 0 42A 5A 20A 5A 20A 5A 20A 9P 2 9P 2 9P 4 9P 4 DATE TSTP 1A INVOICE_NUMBER INVOICE_LINE_NUMBER PRODUCT_NUMBER CUSTOMER_NUMBER SELLING_COMPANY SUPPLY_WAREHOUSE QUANTITY_ORDERED QUANTITY_SHIPPED TOTAL_DISCOUNT NET_PRICE BASE_PRICE UNIT_COST EXTENDED_COST EXTENDED_PRICE MARGIN SALES_REP COMMISSION_VALUE INVOICE_DATE SHIP_DATE DELIVERY_DATE INVOICE_TIME MONTH_NUMBER WEEK_NUMBER LOAD_DATE 7P 0 3P 0 5P 0 10A 5A 5A 11P 0 11P 0 9P 2 9P 2 9P 2 9P 2 11P 2 11P 2 11P 2 5A 7P 2 DATE DATE DATE TIME 2P 0 2P 0 (DATE)

Meaningful table and column names

CUSTOMERS 10A 35A 35A 35A 35A 2A 10A 35A 15A 5A 5A 5A 5A DATE TMSTP 1A

Complex calculations already done

CUSTOMER_NUMBER CUSTOMER_NAME ADDRESS_LINE_1 ADDRESS_LINE_2 CITY STATE_CODE ZIP_CODE CONTACT_NAME TELEPHONE SALES_REP_DEFAULT CUSTOMER_CATEGORY CUSTOMER_CLASS REGION_CODE LOAD_DATE LAST_CHANGE_TIME STATUS_FLAG

De-normalized design reduced to only 3 tables Only includes the columns we care about

Dates are true date columns

Data Warehouse Schema

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Traditional approach to Data Warehousing and Business Intelligence


Evaluate Business Intelligence Tools from several vendors Evaluate Database Management Systems from several vendors Evaluate Extract/Transform/Load (ETL) tools from several vendors Select Vendors (come to contract terms) Size System Develop Systems Integration Plan Develop Operational Skills/Procedures Build Multi-vendor Support Structure Install, Configure, Tune Tune again and again and again
2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

NEW: IBM Smart Analytics System 7700

2x Performance at the cost1


Complete End to End Analytical Solution shipped in a manner of weeks versus months - reducing risk and improving time to value. Completely integrated solution based on powerful warehouse infrastructure with a single point of support Start small and grow big with proven, flexible modular design that preserves your investment and maintains optimized design as you grow Exploit the latest POWER7 architecture
Optimized for Analytics with Massively Parallel Processing design New IBM System Storage with Solid State Drive (SSD) options InfoSphere Warehouse 9.7 adds Oracle compatibility

Compared to 7600
4x more cores per module 50% less space and energy requirements 2x storage capacity per data module
1 Performance per data module, and cost comparing equivalently sized systems in Raw Data Terabytes 2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

POWER7 Cognos Module and Control Console

20-40% Performance Improvements1


Optional Cognos module is refreshed with Power 740 8 core servers improving performance and supporting higher number of concurrent users per module. Design of the Cognos module includes an active-active, workload balancing architecture that is preconfigured and optimized by IBM before shipping

The 7700 control console provides another layer above the hardware and software that allows administration over the system as a whole, rather than each component individually. This console allows the user to manage and maintain software for coordinated stack updates (operating system, driver, firmware, and other components).

Cognos Module
2x performance through optimization at build time 26-40% performance improvement over POWER6 High availability for BI built into system
1 Based on complexity of queries comparing on a core to core basis 2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Months versus weeks


Build from scratch
June

Pre-Built
Jan.

Testing & Validation


5

Installation & Configuration 4 Acquire Components


3

Faster Results Less Risk

Pre-implementation System Sizing


2

Jan. 2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

IBM Smart Analytics System 7700


Whats in the box?
Analytics Software Options
Business Intelligence Module (Cognos 8 BI) InfoSphere Warehouse Cubing Services InfoSphere Warehouse Text Analytics & Data Mining

Data Warehouse Software


InfoSphere Warehouse InfoSphere Warehouse Advanced Workload Management Tivoli System Automation

Hardware/OS
IBM Power 740 Servers (16 core modules) IBM System Storage Designed for the IBM Smart Analytics System including SSD acceleration AIX 6.1

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

IBM Smart Analytics System 7700 Transparent Modular Architecture

Foundation

Structure

+
Foundation Module
1 Module

Data Module
1 to x Nodes

User Module
0 to y Nodes

Failover Module
0 or (x+y+1)/5 Nodes

Add-On Modules Modular design + SSD

Analytics Module

BI Module

Warehouse Packs...

3rd Party Modules

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Scaling out to support more data or more users


Shared Nothing, Massively Parallel Design

Legend
Foundation Module User Module 2 User Module 3
Foundation Module

User Module

MEM CPU CPU CPU CPU

MEM CPU CPU CPU CPU

MEM CPU CPU CPU CPU


Data Module

MEM CPU CPU CPU CPU Data Module 1

MEM CPU CPU CPU CPU

MEM CPU CPU CPU CPU

MEM CPU CPU CPU CPU Data Module 4

MEM CPU CPU CPU CPU Data Module 5

Expand by adding additional user or data modules

Data Module 2 Data Module 3

Balanced system design


System modules with optimal processor, memory, and I/O specifications

Scale-out by adding additional system modules


Which always include balanced I/O

Proven best practice for large scale data warehousing


2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

InfoSphere Warehouse is powered by DB2

DB2 offers many unique, industry leading capabilities that are advantageous to data warehousing environments Shared Nothing Architecture Advanced Cost Based Parallel Query Optimizer Flexible Partitioning Multi-dimensional Clustering (MDC) Materialized Query Tables (MQT) Industry leading compression Workload management

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Online Analytical Processing (OLAP) Cubing Services

Cubing Services is a multidimensional analysis server that enables OLAP applications to access to large data volumes stored inside a DB2 database
Benefits
Empowers users with ad hoc access to business information. What is the profitability for Product A across the Branches X,Y,Z? Speed of thought access to OLAP data managed by DB2 OLAP and SQL shared access to the same information Single point of management, maintenance, and performance tuning Accessible via Cognos 8 BI, Microsoft Excel, Alphablox, IBM DataQuant, and Cubeware Cockpit

InfoSphere Warehouse

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Data Mining & Visualization


InfoSphere Warehouse provides real time data mining embedded in your warehouse, accessible from any application supporting SQL. Benefits
Discover information on customer and product associations and behaviors captured by your application

Examples
Market basket analysis Customer segmentation Click stream analysis Disease and treatment analysis Fraud detection

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Modeling & Design InfoSphere Warehouse Design Studio

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Embedded Data Movement & Transformation The SQL Warehousing Tool (SQW)
InfoSphere Warehouse provides extract, load and transform (ELT) capabilities based on the DB2 server engine. SQL Warehousing Tool application are developed in a fully integrated graphical interface within the InfoSphere Design studio.

Easy to Use & Package Graphically build complex transformations within DB2 Advanced Workflow Control and Scheduling Build your Warehouse Source to Target as one Application Integration Automate Text Analytics & Data Mining workloads Ability to Natively source data from non-DB2 RDBMS Scale up and integrate with Information Server/DataStage Compliance Ability to add version management Job Monitoring

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Oracle/Sun Database Machine (Exadata)


Exadata positioned as it can do it all transaction processing and data warehousing contrary to IBMs workload optimized systems
Oracle Database Machine Database Servers Exadata Storage Servers Rack 2 3 Rack 4 7 Full Rack 8 14

Oracle RAC Scalability

Designed with Oracle RAC shared disk clusters

12 11 10
Perfect Linear Performance Effective Nodes

Provides storage servers to Oracle database Linux on x86 running Oracle Enterprise Linux
Effective Nodes

9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 Data Base Nodes in Cluster 10 11 12


Wasted Resources 2.44

For every Oracle database server there are two additional Exadata servers to perform the I/O, each with 12 disk drives and new Exadata software You have to license each disk drive in the Exadata servers @ $10,000 each for a total of $1,680,000 for a Full Rack plus 22% annual maintenance

1.69

Productive Resources

IBM Internal Use Only

Oracle RAC characteristics as shown in Dell RAC InfiniBand Study 2010 IBM Corporation http://www.dell.com/downloads/global/power/ps2q07-20070279-Mahmood.pdf

IBM Power Systems Technical University Las Vegas, NV

Smart Analytics Systems Oracle Database Machine


Rack (14TB1) $5.15M2

Compared to Exadata Smart Analytics 7700


1 data module (12TB)

List

$3.67M3

50% LESS

Full Rack (28TB) $10.13M

2 data modules (24TB)

$5.07M3

Oracle prices do NOT include

Installation and migration services Additional software, servers and storage required to add analytical applications that are NOT included with Exadata
No OLAP, Data or Text Mining DB2 Compression could increase usable data space 3-5x

DB2 Compression Option is included in the price of IBM Smart Analytics


DB2 requires 60% of the staffing of Oracle

Study of 400 Power Systems clients4


1 Data Volumes represent uncompressed user space 2 Oracle prices include RAC, Partitioning Option, Advanced Compression, Tuning and Diagnostic packs and includes 3 years maintenance and support

3 IBM prices based on preliminary 7700 list pricing and include 3 years SW and HW maintenance and support.
4 http://imcomp.torolab.ibm.com/wiki/images/f/f2/FullSolitaire2008report.pdf

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

2010 IBM Corporation

IBM Power Systems Technical University Las Vegas, NV

Trademarks
The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.
Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market. Those trademarks followed by are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.

For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:


*, AS/400, e business(logo), DBE, ESCO, eServer, FICON, IBM, IBM (logo), iSeries, MVS, OS/390, pSeries, RS/6000, S/30, VM/ESA, VSE/ESA, WebSphere, xSeries, z/OS, zSeries, z/VM, System i, System i5, System p, System p5, System x, System z, System z9, BladeCenter

The following are trademarks or registered trademarks of other companies.


Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
* All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

23

2010 IBM Corporation

You might also like