Professional Documents
Culture Documents
Sunny
Input
Variables
Model Windy
Rainy
Goal: Cloudy
Given input variables,
predict category
Data for Classification
Input Variables Target Variable
Python for Data Science
Classification Classification
• Will it rain • What type of
tomorrow or not? product will this
• Is this transaction customer buy?
legitimate or • Is this tweet
fraudulent positive, negative,
or neutral
Classification Main Points
• Predict category from input variables
Python for Data Science
Input
Model Output
Variables
y=f(ax+b) (y)
(x)
y = mx + b
output
input
Adjusting Model slope m = 2
y-intercept b = +1
Parameters x=1 => y=2*1+1= 3
Python for Data Science
slope m = 2
y-intercept b = -1
x=1 => y=2*1-1=1
Building Machine Learning Model
Model parameters are adjusted during model training to
Python for Data Science
Boundaries
Building vs. Applying Model
Python for Data Science
• Training Phase
• Adjust model parameters
• Use training data
• Testing Phase
• Apply learned model
• Use new data
Building vs. Applying Model
Training
Python for Data Science
Data Build
Model
Model
Learning
Algorithm Training Phase
Test
Data Apply
Results
Model
Model
Testing Phase
Building a Classification Model
Use learning algorithm to model
Python for Data Science
Input
Model Output
Variables
y=f(ax+b) (y)
(x)
Training
Data Build
Model
Model
Learning
Algorithm Training Phase
• kNN
• Decision Tree
• Naïve Bayes
kNN Overview
• Classify sample by looking at its
Python for Data Science
closest neighbors
Decision Tree Overview
Warm- • Tree captures
Python for Data Science
Blooded
multiple
Live Non-
classification
Birth Mammal decision paths
Vertebrate Non-
Mammal
Non-
Mammal
Mammal
Naïve Bayes Overview
Python for Data Science
• Probabilistic approach to
classification
Common Classification Algorithms
Python for Data Science
• kNN
• Decision Tree
• Naïve Bayes
Python for Data Science
Internal Nodes
Leaf Nodes
Classification Using Decision Tree
Root Node
Python for Data Science
Leaf Nodes
Example Decision Tree
Warm- Live Verte- Target
Blooded Birth brate Label
Warm-
Python for Data Science
Blooded
Yes Yes Yes Mammal
Live Non-
Birth Mammal
Vertebrate Non-
Mammal
Non-
Mammal
Mammal
Constructing
Decision Tree Tree
Induction
Python for Data Science
Gini
Index
Lower = More
pure
What Variable to Split On?
Python for Data Science
Defaults Repays
Tree Induction Example: Split 2
Python for Data Science
Defaults Repays
Tree Induction Example: Split 3
Defaults Repays
Python for Data Science
Tree Induction Example
• Resulting model
Python for Data Science
Decision Boundaries
• Rectilinear = Parallel to axes
Python for Data Science
Decision Tree for Classification
Python for Data Science
Decision Tree