Professional Documents
Culture Documents
Specializes in
Application architecture and design SQL Performance Tuning and Optimization Agile, Scrum Certified Scrum Trainer Technology aficionado
Silverlight ASP.NET Windows Forms
Admin Stuff
Attendance
Hands On Lab
Homework Certificate
Course Website
http://bit.ly/UTSSQL
Resources
http://sharepoint.ssw.com.au/Training/UTSSQL/
Course Overview
Session Date 1 Time 18:00 - 21:00 Topic SSIS and Creating a Data Warehouse
Tuesday 01-05-2012
Tuesday 08-05-2012 Tuesday 15-05-2012 Tuesday 22-05-2012 Tuesday 29-05-2012
18:00 - 21:00
18:00 - 21:00
Reporting Services
18:00 - 21:00
18:00 - 21:00
Data Mining
Last week(s)
1.
The plan
Step by step to BI
1. 2. 3. 4. 5.
Create Data Warehouse Copy data to data warehouse Create OLAP Cubes Create Reports Browse the cube
6.
Agenda
1. 2. 3.
4.
5. 6.
Algorithms
Demo Hands on Lab
Marketing
Who picks the movie? The kids, the wife, me Who are our Customers and what sort of films do they hire? Is a 30 year old woman with 2 children going to hire Arnies latest film Validation Is this data sensible? Terminator 2 and Toy Story Prediction
Get new information from data, future trends, past trends, outlier, maximums, minimums Analyse data from different perspectives and summarizing it into useful information New information to increase revenue cuts costs or both :-)
2.
3.
Who are our biggest customers? What are customers buying with cigars? What are the customer retention levels of our branches?
Which customers have bought olives, feta cheese but no ciabatta bread?
Which regions have the highest male/female ratio of single 20 somethings? Which region has lowest customer retention levels and list out lost customers?
Huge amount of data Good raw material good data mining Samples should be representative Samples "similar" to domain Not all-seeing crystal ball Verify and Validate!
OLAP
Is about fast ad hoc querying Analysis by dimensions and measures Gives precise answers May use RDBMS or OLAP source Is about discovering and predicting Gives imprecise answers
Data Mining
OLAP is not a prerequisite for data mining, but it almost always comes first
Classification algorithms
predict one or more discrete variables, based on the other attributes in the dataset
Regression algorithms
predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset
Segmentation algorithms
divide data into groups, or clusters, of items that have similar properties
Association algorithms
Decision Trees
Clustering
Time Series
Neural Network
Association
Nave Bayes
Sequence Clustering
Linear Regression
Logistic Regression
Decision trees
Decision Trees assign (classify) each case to one of a few (discrete) broad categories of selected attribute (variable) and explains the classification with few selected input variables The process of building is recursive partitioning splitting data into partitions and then splitting it up more Initially all cases are in one big box
The algorithm tries all possible breaks in classes using all possible values of each input attribute; it then selects the split that partitions data to the purest classes of the searched variable
Decision trees are used for classification and prediction Typical questions:
Predict which customers will leave Help in mailing and promotion campaigns Explain reasons for a decision What are the movies young female customers like to buy?
Nave Bayes
Bayes Formula Uses statistics to say falls into certain category or not with probability Spam filtering: score of spam (Bayes) Testing only a particular attribute
Nave Bayes
Quickly builds mining models that can be used for classification and prediction It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute
This can later be used to predict an outcome of the predicted attribute based on the known input attributes
This makes the model a good option for exploring the data
Objects within a cluster have high similarity based on the attribute values
Partitioning methods Hierarchical methods Density based methods Model based methods And more
Segments a heterogeneous population into a number of more homogenous subgroups or clusters Some typical questions:
Discover distinct groups of customers Identification of groups of houses in a city In biology, derive animal and plant taxonomies Find outliers
Clustering
Annual Income
Age
Time series
Sequence clustering
Numbers orders stronger associations Direction of association (not necessary the other direction)
Association
If you own certain stocks ' you own maybe other ones as well Probability = thickness of line
Neural Nets
Let system learn how to classify data Neural Network adapts to the new data Formulate statement/hypothesis
Outcome is know
(Data / Surveys)
1. 70% data to train network (outcome is known) 2. 30% of data to test network (outcome is known) 3. New data (no survey needed, predict from network)
Predicting a sequence.
For example, use market basket analysis to suggest additional products to a customer for purchase.
Microsoft Clustering Algorithm For example, segment demographic data Microsoft Sequence Clustering Algorithm into groups to better understand the relationships between attributes.
There is more...
Visual Numerics
http://www.vni.com/company/whitepapers/ MicrosoftBIwithNumericalLibraries.pdf
Microsoft SQL Server 2008 Data Mining Add-ins for Microsoft Office 2007
http://www.microsoft.com/downloads/en/details.aspx?familyid=8 96A493A-2502-4795-94AE-E00632BA6DE7&displaylang=en
Farmers
Supermarket
Find to figure out how to get you to buy more, where the expensive items
Tip
SSIS 2008 - Data profiling task Get a profile of the data in a table
potential candidate keys length of data values in columns Null percentage of rows distribution of values ....
Resources 1
http://www.sqlservercentral.com/articles/Video/65055/
http://www.sqlservercentral.com/articles/Video/64190/
http://msdn.microsoft.com/en-us/library/ms175595.aspx
Resources 2
Jamie MacLennan
http://blogs.msdn.com/b/jamiemac/
Richard Lees on BI
http://richardlees.blogspot.com/
Summary
Demo
Hands on Lab
3 things
Thank You!
Gateway Court Suite 10 81 - 91 Military Road Neutral Bay, Sydney NSW 2089 AUSTRALIA ABN: 21 069 371 900