You are on page 1of 36

Machine Learning with MATLAB

Stefan Duprey, Application Engineer


stefan.duprey@mathworks.fr

2013 The MathWorks, Inc.1

What You Will Learn

Overview of machine learning

Algorithms available with MATLAB

MATLAB as an interactive environment


for evaluating and choosing the best algorithm

Machine Learning
Basic Concepts

Start with an initial set of data

Learn from this data


Train your algorithm
with this data

Use the resulting model


to predict outcomes
for new data sets

1
Group1

0.9

Group2

0.8

Group3

0.7

Group4
Group5

0.6

Group6
Group7

0.5

Group8

0.4
0.3
0.2
0.1
0

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

Machine Learning
Characteristics and Examples

Characteristics
Lots of data (many variables)
System too complex to know
the governing equation
(e.g., black-box modeling)

Examples

Pattern recognition (speech, images)


Financial algorithms (credit scoring, algo trading)
Energy forecasting (load, price)
Biology (tumor detection, drug discovery)

AAA 93.68%

5.55%

0.59%

0.18%

0.00%

0.00%

0.00%

0.00%

AA 2.44%

92.60%

4.03%

0.73%

0.15%

0.00%

0.00%

0.06%

A 0.14%

4.18%

91.02%

3.90%

0.60%

0.08%

0.00%

0.08%

BBB 0.03%

0.23%

7.49%

87.86%

3.78%

0.39%

0.06%

0.16%

BB 0.03%

0.12%

0.73%

8.27%

86.74%

3.28%

0.18%

0.64%

B 0.00%

0.00%

0.11%

0.82%

9.64%

85.37%

2.41%

1.64%

CCC 0.00%

0.00%

0.00%

0.37%

1.84%

6.24%

81.88%

9.67%

D 0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

100.00%

AA

BBB

BB

CCC

AAA

Model Development Process

Exploration

Modeling

Evaluation

Deployment

Exploratory Data Analysis


Gain insight from visual examination

MPG

40

Displacement Acceleration

Identify trends and interactions


Detect patterns
Remove outliers
Shrink data
Select and pare predictors
Feature transformation

Weight

20
20
10
400
200

4000
2000

Horsepower

200
150
100
50
20
MPG

40

10

20

Acceleration

200

400 2000

Displacement

4000

Weight

50 100150200
Horsepow er

Data Exploration
Interactions Between Variables
8

4
6
8

20
3

20

400
200

4000

200
150
100
50
20

40

MPG

10

20

Acceleration

200

400 2000

Displacement

4000

Weight

f(t)

10

2000
Horsepower

4
6
8

Coordinate Value

Weight

Displacement Acceleration

MPG

40

-2

-1

-4

-2

-6

-3
MPG

50 100150200

Acceleration

Displacement

Weight

-8

Horsepower

Horsepow er

Plot Matrix by Group

0.1

0.2

0.4

0.5
t

0.6

0.7

0.8

0.9

Andrews Plot

Parallel Coordinates Plot

chevrolet chevelle malibu buick skylark 320

0.3

plymouth satellite

chevrolet chevelle malibu


buick skylark 320 plymouth satellite

amc rebel sst

chevrolet impala

ford torino

plymouth fury iii

Glyph Plot

ford galaxie 500

pontiac catalina

amc rebel sst

ford torino

ford galaxie 500

chevrolet impala

plymouth fury iii

pontiac catalina

Chernoff Faces
7

Machine Learning Overview


Types of Learning, Categories of Algorithms

Machine
Learning

Type of Learning

Categories of Algorithms

Unsupervised
Learning

Clustering

Group and interpret


data based only
on input data

Classification
Supervised
Learning
Develop predictive
model based on both
input and output data

Regression

Unsupervised Learning
Clustering
K-means,
Fuzzy K-means

Hierarchical

Unsupervised
Learning

Clustering
Neural Network

Machine
Learning

Group and interpret


data based only
on input data

Gaussian
Mixture

Classification
Supervised
Learning
Regression

Clustering
Overview
1

What is clustering?
Segment data into groups,
based on data similarity

0.9
0.8
0.7
0.6

Why use clustering?


Identify outliers
Resulting groups may be
the matter of interest

0.5
0.4
0.3
0.2
0.1
0
-0.1

0.1

0.2

0.3

0.4

0.5

0.6

How is clustering done?


Can be achieved by various algorithms

It is an iterative process (involving trial and error)

10

Dataset Well Be Using

Cloud of randomly generated points


Each cluster center is
randomly chosen inside
specified bounds
Each cluster contains
the specified number
of points per cluster

1
Group1

0.9

Group2

0.8

Group3

0.7

Group4
Group5

0.6

Group6
Group7

0.5

Group8

0.4

Each cluster point


is sampled from a
Gaussian distribution

0.3
0.2
0.1
0

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

Multi-dimensional dataset
11

Example Cluster Analysis


K-Means

K-means is a partitioning method

Partitions data into k mutually


exclusive clusters

Each cluster has a


centroid (or center)

Statistics Toolbox

Sum of distances from


all objects to the center
is minimized

12

Distance Metrics & Group Quality

Distance measures choices


Many built-in distance metrics, or
define your own

Cosine Distance
Useful for clustering variables

>> doc pdist


>> distances = pdist(data,metric); %pdist =
pairwise distances
Cityblock Distance
>> squareform(distances)
Useful for discrete variables
>> kmeans(data,k,distance,cityblock)
%not all metrics supported

Euclidean Distance
Default

Create silhouette plots

>> silhouette(data,clusters)

13

Clustering
Neural Network

Networks are comprised of one or more layers

Outputs computed by
applying a nonlinear
transfer function with
weighted sum of inputs

Trained by letting the network


continually adjust itself
to new inputs (determines weights)

Weights
Input
variables

Transfer
function
Output
Variable

Bias

14

Clustering
Neural Network

Neural Network Toolbox provides


interactive apps for easily
creating and training networks

Multi-layered networks
created by cascading

Neural Network Toolbox

(provide better accuracy)

Example architectures for clustering:


Self-organizing maps
Competitive layers

15

Self Organising Map Neural Net


How it Works

SOM Weight Positions


1.2

Started with a regular grid of


neurons laid over the dataset

0.8

Weight 2

0.6

0.4

0.2

Size of the grid determined


the number of clusters

-0.2
-0.5

0.5

Weight 1

Neurons competed to recognize


data points (by being close to them)

SOM Weight Positions


1
0.9
0.8
0.7

Winning neurons were moved


closer to the data points

0.6

Weight 2

0.5
0.4
0.3

Repeated until convergence

0.2
0.1
0
-0.2

0.2
Weight 1

0.4

0.6

16

Gaussian Mixture Models

Statistics Toolbox

Good when clusters have different


sizes and are correlated

Assume that data is drawn


from a fixed number K
of normal distributions
20

10
1
0
1

0.8
0.6

0.8
0.6

0.4
0.4

0.2

0.2
0

17

Cluster Analysis
Summary

Segments data into groups, based on data similarity

No method is perfect

K-means,
Fuzzy K-means

(depends on data)

Hierarchical

Process is iterative;
explore different algorithms
Beware of local minima

Clustering
Neural Network

Gaussian
Mixture

(global optimization can help)

18

Model Development Process

Exploration

Modeling

Evaluation

Deployment

19

Supervised Learning
Classification for Predictive Modeling

Unsupervised
Learning
Decision Tree

Machine
Learning

Ensemble
Method

Classification
Supervised
Learning
Develop predictive
model based on both
input and output data

Neural Network

Support Vector
Machine

20

Classification
Overview
1
Group1

0.9

What is classification?
Predicting the best group for each point
Learns from labeled observations
Uses input features

Group2

0.8

Group3

0.7

Group4
Group5

0.6

Group6
Group7

0.5

Group8

0.4
0.3
0.2

Why use classification?


Accurately group data never seen before

0.1
0

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

How is classification done?


Can use several algorithms to build a predictive model
Good training data is critical

21

Example Classification
Decision Trees

Statistics Toolbox

Builds a tree from training data


Model is a tree where each node is a
logical test on a predictor

Traverse tree by comparing


features with threshold values

The leaf of the tree


specifies the group

22

Ensemble Learners

Statistics Toolbox

Overview
1.5

Decision trees are weak learners


Good to classify data used to train
Often not very good with new data
Note rectangular groups

group2
group3

group4
group5
group6

x2

group1

group7

0.5

group8

What are ensemble learners?

-0.5

-0.4

-0.2

0.2

0.4

0.6

0.8

1.2

1.4

1.6

x1

Combine many decision trees to create a


strong learner
Uses bootstrapped aggregation

Why use ensemble methods?


Classifier has better predictive power
Note improvement in cluster shapes
23

Decision Trees

Statistics Toolbox

How do I build them with MATLAB?


Build tree model
>> tree = classregtree(x,y);
>> view(tree)
1.5

group1
group2
group3

group5

model on new data


x2

Evaluate the
>> tree(x_new)

group4
group6
group7

0.5

group8

-0.5

-0.4

-0.2

0.2

0.4

0.6

0.8

1.2

1.4

1.6

x1

24

Enhancing the model : Ensemble Learning

Combine weak learners into a stronger learner

>> ens =fitensemble(x,y,'AdaBoostM2',200,'Tree');

Bootstrapped aggregated trees forest

>> ens = fitensemble(x,y,'Bag',200,'Tree,'type','classification');


>> y_pred = predict(ens,x);

Visualise class boundaries

25

K-Nearest Neighbor Classification


One of the simplest classifiers

Takes the K nearest points


from the training set, and
chooses the majority class
of those K points

No training phase all the


work is done during the
application of the model

1.5
group1
group2
group3

group4
group5
group6

x2

Statistics Toolbox

group7

0.5

group8

-0.5

-0.4

-0.2

0.2

0.4

0.6

0.8

1.2

1.4

1.6

x1

26

MATLAB Helps to Manage Complexity

A single calling syntax


for all methods

Documentation helps
you choose an
appropriate algorithm for
your particular problem

27

Support Vector Machines

Statistics Toolbox
(as of R2013a)

Overview

Good for modeling with complex


boundaries between groups
Can be very accurate
No restrictions on the predictors

4
1
2
Support Vectors

What does it do?


Uses non-linear kernel to
calculate the boundaries
Can be computationally intensive

-1

-2
-3

-2

-1

Version in Statistics Toolbox only


classifies into two groups
28

Classification
Summary

Decision Tree

No absolute best method

Ensemble
Method

Simple does not


mean inefficient

Classification
Neural Network

Support Vector
Machine

Watch for overfitting


Decision trees and neural networks may overfit the noise
Use ensemble learning and cross-validation

Parallelize for speedup

29

Supervised Learning
Regression for Predictive Modeling

Unsupervised
Learning

Machine
Learning

Supervised
Learning
Develop predictive
model based on both
input and output data

Linear

Regression

Non-linear

Non-parametric

30

Regression

Statistics Toolbox
Curve Fitting Toolbox

Why use regression?


Predict the continuous response
for new observations

Type of predictive modeling


Specify a model that describes
Y as a function of X
Estimate coefficients that
minimize the difference
between predicted and actual

You can apply techniques from earlier sections with


regression as well (e.g., Neural Network)
31

Linear Regression

Y is a linear function of the regression coefficients

Common examples:
Straight line

= 0 + 11

Plane

= 0 + 11 +22

Polynomial

= 0 + 113 + 212 +31

Polynomial
with cross terms

= 0 + 112 + 2(1 2) + 3 22

32

Nonlinear Regression

Y is a nonlinear function of the regression coefficients

Syntax for formulas:


Fourier Series

y ~ b0 + b1*cos(x*b3) +
0 + 1 cos 3 + 2 sin 3
b4*sin(x*b3)

Exponential Growth

= 0
@(b,t)(b(1)*exp(b(2)*t)

Logistic Growth

0
@(b,t)(1/(b(1)
+ exp( =
1 + 1
b(2)*x)))

33

Generalized Linear Models

Extends the linear model


Define relationship between model and response variable
Model error distributions other than normal

Logistic regression
Response variable is binary (true / false)
Results are typically expressed as an odds ratio

Poisson regression
Model count data (non-negative integers)
Response variable comes from a Poisson distribution

34

Machine Learning with MATLAB

Interactive environment
Visual tools for exploratory data analysis
Easy to evaluate and choose best algorithm
Apps available to help you get started
(e.g,. neural network tool, curve fitting tool)

Multiple algorithms to choose from


Clustering
Classification
Regression

35

Learn More : Machine Learning with MATLAB


http://www.mathworks.com/discovery/
machine-learning.html

Data Driven Fitting


with MATLAB

Multivariate Classification in
the Life Sciences

Classification
with MATLAB

Electricity Load and


Price Forecasting

Regression
with MATLAB

Credit Risk Modeling with


MATLAB

36

You might also like