Professional Documents
Culture Documents
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Examples of Classification Task
• Predicting tumor cells as benign or malignant
6 No Medium 60K No
Training Set
Apply Decision
Model Tree
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Bayes Classifier
• A probabilistic framework for solving
classification problems P ( A, C )
P (C | A)
• Conditional Probability: P ( A)
P ( A, C )
P( A | C )
P (C )
P( A A A )
1 2 n
1 2 n
go
r ic a l
from
in
u o u s
s
Data?
c a t e
c a t e
c o n t
c las • Class: P(C) = Nc/N
Tid Refund Marital Taxable – e.g., P(No) = 7/10,
Status Income Evade
P(Yes) = 3/10
1 Yes Single 125K No
2
3
No
No
Married
Single
100K
70K
No
No
• For discrete attributes:
4 Yes Married 120K No P(Ai | Ck) = |Aik|/ kNc
5 No Divorced 95K Yes
6 No Married 60K No – where |Aik| is number of
7 Yes Divorced 220K No instances having attribute
8 No Single 85K Yes
Ai and belongs to class Ck
9 No Married 75K No – Examples:
10 No Single 90K Yes P(Status=Married|No) = 4/7
10
P(Refund=Yes|Yes)=0
How to Estimate Probabilities
from Data?
• For continuous attributes:
– Discretize the range into bins
k
• one ordinal attribute per bin
• violates independence assumption
– Two-way split: (A < v) or (A > v)
• choose only one of the two splits as new attribute
– Probability density estimation:
• Assume attribute follows a normal distribution
• Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
• Once probability distribution is known, can use it
to estimate the conditional probability P(Ai|c)
l l
How togo Estimate
g o in
Probabilities
s
r ic a
from Data
r ?
ic a
t
u o u s
at
e
at
e
o n las
c c c c
Tid Refund Marital
Status
Taxable
Income Evade • Normal distribution:
( Ai ij ) 2
1
1 Yes Single 125K No
P( A | c ) e 2 ij2
2
i j 2
2 No Married 100K No
ij
3 No Single 70K No
4 Yes Married 120K No – One for each (Ai,ci) pair
5 No Divorced 95K Yes
6 No Married 60K No • For (Income, Class=No):
7 Yes Divorced 220K No
– If Class=No
8 No Single 85K Yes
• sample mean = 110
9 No Married 75K No
10 No Single 90K Yes
• sample variance = 2975
10
1
( 120110) 2
N ic mp
m - estimate : P( Ai | C )
Nc m
Example of Naïve Bayes
Name
human
Give Birth
yes
Classifier
no
Can Fly Live in Water Have Legs
no yes
Class
mammals
A: attributes
python no no no no non-mammals M: mammals
salmon no no yes no non-mammals
whale yes no yes no mammals N: non-mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
6 6 2 2
bat
pigeon
yes
no
yes
yes
no
no
yes
yes
mammals
non-mammals
P ( A | M ) 0.06
cat yes no no yes mammals
7 7 7 7
leopard shark yes no yes no non-mammals 1 10 3 4
turtle no no sometimes yes non-mammals P ( A | N ) 0.0042
penguin no no sometimes yes non-mammals 13 13 13 13
porcupine yes no no yes mammals
7
P ( A | M ) P ( M ) 0.06 0.021
eel no no yes no non-mammals
salamander no no sometimes yes non-mammals
gila monster no no no yes non-mammals 20
platypus no no no yes mammals
13
owl
dolphin
no
yes
yes
no
no
yes
yes
no
non-mammals
mammals
P ( A | N ) P ( N ) 0.004 0.0027
eagle no yes no yes non-mammals 20