SVM Emaro

SUPPORT VECTOR
MACHINES
EMARO Group
Ahmad AD
Daravuth Koung
Debaleena Misra
Fernando Nunez Mendoza
Sukumar Karumuri
Yu-Sin Lin
Date 09.12.16
Outline
Introduction to Machine Learning
Definition
Classification & Techniques
Support Vector Machine
Definition and Application
Mathematical Detail
Computational Example
Comparison with Neural Network
Conclusion
Machine Learning - Definition
What is Machine Learning?
"A computer program is said to learn from experience E

with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured
by P, improves with experience E."
- Tom M. Mitchell, Head of M.L. Dept. at CMU
Spam Email Filtering:

E Collected data (spam and non-spam emails).
T Marking each email as spam or not.
P Accuracy (correct decisions / total decisions)
Machine Learning - Classification
By type of problems and tasks
Supervised Learning
Unsupervised Learning
Reinforcement Learning
By output of a machine-learned system
Classification (Two-Class & Multiclass)
Regression
Clustering
Classification Regression Clustering

Machine Learning
Techniques
Linear regression (R)
Logistic regression (C)
Trees, forests, and jungles (R&C)
Neural networks (R&C) Bayes Classifier
Bayes Methods (R&C)

K-Means Clustering (K)
SVMs (C) K-Means Clustering
R
Regression
C
Neural Network Classification
Tree
History of SVM
Support Vector Machine
(SVM) is a classifier derived
from statistical learning
theory by Vapnik and
Chervonenkis
Said to start in 1979 with

Vladimir Vapniks paper, the
major developments were in
the 1990s
SVM became popular

because of its success in
handwritten digit recognition
(in NIST (1998)). It gave
accuracy which is
comparable to sophisticated
neural networks with
What is SVM useful for?
Pattern Recognition:
- Object Recognition
- Text Categorization
- Face Recognition
- Handwriting
recognition
Bioinformatics (Protein classification, cancer
classification)
Stock market performance
Linearly Separable Data
Training
set
Hyper-plane
Many possible decision

boundaries can
separate these points
into two classes.
Which one to
choose?
The SVM Approach
The SVM algorithm seeks to maximise the margin around the
separating hyper-plane such that this decision boundary is as
far away from data from both the classes as possible
Supporting The decision function is

vectors fully specified by a subset
of training samples closest
to the hyper-plane called
the support vectors
d
d Margin
Ma
rg
i n
m Thus, we get the widest
margin when we is the
minimum
How to optimise?
The Mathematics behind the
SVM
Two major
classifications: +1/-1
Objective: to get the
best distance margin
Though H2 divides the
sample space, it has a
very low distance
margin
H3 is the best option
For linearly separable point
set
For 2D, the equation of the

hyperplane is of the form
Ax+By+C=0, where A,B are
a1,a2 and C is a0
Required optimization
Distance from
origin for
hyperplane 1: |1-
a0|/|a|
For hyperplane 2: |-
1-a0|/|a|
Hence distance
between them is
2/|a|
Goal: minimize |a|

Solutions
Lagrange Function:
Quad-prog function in MATLAB

Using support vectors- Support vectors
are learning samples used for the
computation of the hyperplane.
Extension to non-linear sample sets
Penalty component or Hinge

Loss function
Here, E7,E5,E8,E6 are error

margins
Kernel functions (explained

in consequent slide)
Non Linear Classification: Kernel
What if data is not separable linearly?

Non Linear Classification: Kernel
For such data non linear classification

is performed.
Gaussian Kernel is generally used.
Non Linear Classification: Gaussian
Kernel
COMPUTATIONAL EXAMPLE
SPAM CLASSIFIER
SPAM CLASSIFIER
Many mail services today provide
spam classifier.
Here a technique of SPAM classifier
based on SVM is represented. (This
code was written as part of Coursera
course on Machine Learning).
SPAM Classifier: Objective
To train a classifier capable of
distinguishing between Spam and
Non-spam emails with certain
accuracy.
In other words, we want to predict
y=1 for a spam email and y=0 for a
non spam email.
SPAM classifier: Preprocessing Steps
Only body of dataset emails in considered.

Lower casing
Stripping HTML
Normalizing URLs
Normalizing Email Addresses
Normalizing Numbers
Normalizing Dollars
Word Stemming
Removal of Special Characters
SPAM classifier: Preprocessing Steps
SPAM classifier: Normalization
SPAM Classifier: Vocabulary List
After preprocessing vocabulary list is
created.
How to choose which word to use and
which to discard.
Here word that occur 100 times in the
email dataset (spam) are considered only.
The list consists of 1899 words.
In practice, a vocabulary list with about
10,000 to 50,000 words is often used.
Mapping
Once the list is
ready, it can be used
for mapping of
email. i.e. replacing
each word with the
index of vocabulary
list if applicable or
discarding it
otherwise.
SPAM Classifier: Feature Extraction
Converting email into feature vector

x[i].
The algorithm runs through the
mapped data and put 1 at the ith
index if the ith word is present in the
mapped data.
SPAM Classifier: Training
SPAM Classifier: Test Accuracy
Test Accuracy: 98.9%

Brief introduction for NN
classification
Linearly separable cases
NN are heuristic, while SVMs are theoretically
founded.
SVM is guaranteed to find converge towards best
solution while NN may not
Linearly non-separable cases
Introduce additional neurons in the hidden
layer
Why SVM over NN?
Training is more efficient

Always gives global and unique minimum
it is a convex optimization problem
Fewer parameters to select
kernel
error cost
Less prone to over-fitting
over-fitting can be controlled by soft margin
approach
Weakness of SVM
The choice of Kernel function
Here is no concrete theory for choosing a kernel
function
Sensitive to noise
A few off-point data can dramatically decrease its
performance
It is a binary classifier
Able to classify between only two classes
Training and testing is slow compared to NN
It is the Constrained Quadratic Programming
problem
Conclusion
SVM is a supervised machine learning technique
Uses the theories of optimization to find the
most optimal hyper-plane
Useful for many applications:
Pattern recognition
Bioinformatics
Stock market performance
Thank you for your
attention!

SVM Emaro

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SVM Emaro

Uploaded by

Copyright:

Available Formats

SUPPORT VECTOR

"A computer program is said to learn from experience E

Spam Email Filtering:

Classification Regression Clustering

Bayes Methods (R&C)

Said to start in 1979 with

SVM became popular

Many possible decision

Supporting The decision function is

For 2D, the equation of the

Goal: minimize |a|

Quad-prog function in MATLAB

Penalty component or Hinge

Here, E7,E5,E8,E6 are error

Kernel functions (explained

What if data is not separable linearly?

For such data non linear classification

Only body of dataset emails in considered.

Converting email into feature vector

Test Accuracy: 98.9%

Training is more efficient

You might also like