Professional Documents
Culture Documents
Research 0
1
Research 2
3
Overview
The setting
Why data mining is a must?
Why data mining is not happening?
A Data Miners Story
Grand Challenges: Pragmatic
Grand Challenges: Technical
Some case studies
Concluding Remarks
Research
4
Research
5
Research
6
Examples:
which records represent fraudulent transactions?
which households are likely to prefer a Ford over a Toyota?
Whos a good credit risk in my customer DB?
Research
8
Predict for me
Research
9
The myths
Companies have built up some large and
impressive data warehouses
Data mining is pervasive nowadays
Large corporations know how to do it
There are tools and applications that discover
valuable information in enterprise databases
Research
10
The truths
Data is a shambles,
most data mining efforts end up not benefiting
from existing data infra-structure
Corporations care a lot about data, and are
obsessed with customer behavior and
understanding it
They talk a lot about it
An extremely small number of businesses are
successfully mining data
The successful efforts are one-of, lucky
strikes
Research
11
Ancient Egypt
Data navigation, exploration, & exploitation technology
is fairly primitive:
we know how to build massive data stores
we do not know how to exploit them
we do the book-keeping really well (OLTP)
Inadequate basic understanding of navigation /systems
many large data stores are write-only (= data tomb)
Research
12
Research
13
Researcher view
Database
Algorithms and
Theory
Systems
Research
14
Practitioner view
Database
Customer
Systems and integration
Algorithms
Research
15
Business view
Customer Database
Systems Algorithms
$$$s
Research
16
Research
Data Mining Based Solution
Research
Research
20
Research
21
Research
22
Research
23
Research
24
Research
25
Grand Challenges
Pragmatic:
Achieving integration and invisibility
Research/Technical:
Solving some serious unaddressed problems
Research
26
Solution:
integration with operational systems
Take a serious database approach to solving the
storage management problem
Research
27
digiMine Background
Research
28
Sample Customers
Research
29
Research
30
Research
31
Research
32
Data Strategy
Research
33
Research
34
Research
Case Study Wireless Telco
Churn Modelling and Prediction
Research 35
36
Modeling Process
2 Sample
3 Build
4 Score
6 High Risk
Database Churn Database Med Risk
Model
Low Risk
5 Assign
6 High Val
Customer Med Val
Risk
1
Value
Customer Low Val
Interaction High Val High Val High Val
Base High Risk Med Risk Low Risk
Research
38
Detailed data
Integration of CDR, WIG, SMS, Billing
Maintained at detailed level
Integrated data mining
Algorithms tuned to model thousands of variables and millions of
rows
Accurate Forecasts
System Robustness
Massively scalable back end system
Flexible architecture to create new variables quickly and easily
Collaborative Service Model
Service model which guarantees success
Combined IQ Model to optimize science and business knowledge
Low cost to create and maintain models
Research
39
High
Save Program
Let them Cautiously Aggressively
go Defend Defend
Contract
Cost Reducing Equipment Renewal
Programs Upgrade
Feature Add Elite Program
Churn Change Plan
Probability Grow Nurture /
Bad Migration Margin Maintain
Behavior
Feature Use Loyalty Programs
Low
Low
Forecasted
Negative High
LTV
Research
40
For Example:
Network System Usage Cost
Mobile to Land Connections Costs
Technical Operations/Support Costs
Long Distance Costs
Inter-Carrier /International subsidy costs
Roaming Costs
Bad Debt Allocation
Many others
Research
41
For Example:
Deposit Value
Product mix
Average. daily balance
Monthly service fees
Technical operations/Support costs
Branch/teller usage
Late payment/Overdraft history
Interest rate
Contract term
Credit Score
Employment history/Income
Research
42
Research
43
Recommendations
Collaborative
Filtering
Context
Sensitive
Approach
Research
44
Research
45
Effectiveness Measurement
How do we measure [honestly] the effectiveness of a model in a
context?
Return on Investment (ROI) measurement
Evaluation in the context of the application
A framework and methodology for measurement
and evaluation
Build the measurement method as part of the design of the
model
An engineering recipe for measurements, and a set of metrics
Research
Technical Challenges
Research 46
47
Technical Challenges
Research
48
Technical Challenges
2. Complexity/understandability tradeoff
Explaining how, when and why a model works
Explaining when a model fails
A Tuning Dial for reducing the complex into the
understandable
Research
49
Technical Challenges
3. Interestingness
What is an interesting pattern or summary?
How do you measure novelty?
What is unusual? When is it worthy of attention?
Is it low probability events? High summarization ability? Outliers?
Good fits? Bad fits?
Research
50
Technical Challenges
4. Scalability
Beyond just dealing with a large data set:
Principled feature reduction: what is SVD equivalent? Graceful
degradation with dimensionality
Uncovering graphical structure in data
Communities, relations, link analysis,
Dealing with multiple data types:
Structured, sparse, dense, text, images, video, audio, sequence
data, etc.
I have yet to see an algorithm that deals with more than one type.
Integration with DBMS
Appropriate sampling
Appropriate operator abstractions
Taking care of minor details
Initialization?
Determining k
Research
51
Technical Challenges
5. A theory for what we do
What are the fundamental abstractions?
What are the basics operations? What are the basic
components of an algorithm?
What is it that we are optimizing?
What is hard? What is doable? Why?
What is a data summary?
When are two attributes similar? Can you measure
efficiently?
How do we extract the right representation?
Research
52
Research
Summary
Pragmatic and Technical Grand Challenges
Research 53
54
Challenges
Predict for me
Research
56
Research
57
Data Strategy
Research
58
Research
Yahoo! Case Study
Evolving the Data Strategy as Chief Data Officer
Research 59
60
2,000 500
1,000
Amazon
Walmart
warehouse
Telecom
Warehouse
AT&T
Y! Panama
Y! LiveStor
Korea
Y! Main
SABRE VISA NYSE Y! Panama Y! Data
Highway
Research
62
To be continued
Research
Thank You! & Questions?
Usama_fayyad@yahoo.com
Research 63