Professional Documents
Culture Documents
Ke Chen
Outline
Background
Probability Basics
Probabilistic Classification
Nave Bayes
Principle and Algorithms
Example: Play Tennis
Relevant Issues
Summary
COMP24111 Machine Learning
Background
There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM
Probability Basics
Prior, conditional and joint probability for random
variables
P(X )
Prior probability:
Conditional probability:
X ( X1 , X2 ), P( X ) P(X1 ,X2 )
Independence:
Bayesian Rule
Likelihood Prior
P( X |C )P(C )
P(C |X )
Posterior
P( X )
Evidence
Probability Basics
Quiz:
1) P( A) ?
2) P(B) ?
3) P(C) ?
4) P( A |B) ?
5) P(C |A) ?
6) P( A , B) ?
7) P( A , C ) ?
8) Is P( A , C ) equal to P(A) P(C) ?
5
COMP24111 Machine Learning
Probabilistic Classification
Establishing a probabilistic model for classification
Discriminative model
P(C |X ) C c1 , , c L , X (X1 , , Xn )
P(c1 |x) P(c 2 |x)
P(c L |x )
Discriminative
Probabilistic Classifier
x1
x2
xn
x ( x1 , x2 , , xn )
COMP24111 Machine Learning
Probabilistic Classification
Establishing a probabilistic model for classification
(cont.)
Generative
P( Xmodel
|C ) C c1 , , c L , X (X1 , , Xn )
P( x |c1 )
P( x |c 2 )
P( x |c L )
Generative
Probabilistic Model
Generative
Probabilistic Model
Generative
Probabilistic Model
for Class 1
for Class 2
x1
x2
xn x1
x2
for Class L
xn
x1
x2
xn
x ( x1 , x2 , , xn )
COMP24111 Machine Learning
Probabilistic Classification
MAP classification rule
MAP: Maximum A Posterior
Assign x to c* if
P(C c * |X x ) P(C c |X x ) c c * , c c1 , , c L
Nave Bayes
Bayes classification
P(C |X ) P( X |C )P(C ) P( X1 , , Xn |C )P(C )
Difficulty: learning the joint probability
P( X1 , , Xn |C )
P( X1 |C )P( X2 , , Xn |C )
P( X1 |C )P( X2 |C ) P( Xn |C )
x ( x1 , x2 , , xn )
*
[ PMAP
( x1 |c *classification
) P( xn |c * )]P( crule:
) [ P(for
x1 |c ) P( xn |c )]P(c ), c c * , c c1 , , c L
Nave Bayes
For each target value of ci (ci c1 , , cL )
P (C c ) estimate P (C c ) with examples in S;
i
X ( a1 , , an )
[ P ( a1 |c * ) P ( an |c * )]P ( c * ) [ P ( a1 |c) P ( an |c)]P (c ), c c * , c c1 , , c L
10
COMP24111 Machine Learning
Example
Example: Play Tennis
11
COMP24111 Machine Learning
Example
Learning Phase
Outlook
Play=Yes Play=No
Sunny
Overcast
Rain
Humidity
2/9
4/9
3/9
Temperature
Play=Yes
Play=No
Hot
2/9
4/9
3/9
2/5
2/5
1/5
3/5
0/5
2/5
Play=Yes Play=No
Mild
Cool
Wind
Play=Yes
Play=No
High
3/9
4/5
Strong
3/9
3/5
Normal
6/9
1/5
Weak
6/9
2/5
P(Play=Yes) = 9/14
P(Play=No) = 5/14
12
temperature
yes
no
sunny
overcast
rainy
humidity
yes
no
hot
mild
cool
sunny
2/9
3/5
hot
2/9
2/5 high
3/9
overcast
4/9
0/5
mild
4/9
2/5 normal
6/9
rainy
3/9
2/5
cool
3/9
1/5
windy
yes
no
high
normal
play
yes
no
yes
no
false
true
4/5 false
6/9
2/5
9/14
5/14
1/5 true
3/9
3/5
A new day
outlook
temperature
humidity
windy
play
sunny
cool
high
true
Likelihood of yes
2 3 3 3 9
0.0053
9 9 9 9 14
Likelihood of no
3 1 4 3 5
0.0206
5 5 5 5 14
Therefore, the prediction is No
Example
Test Phase
P(Outlook=Sunny|Play=No) = 3/5
P(Wind=Strong|Play=Yes) = 3/9
P(Wind=Strong|Play=No) = 3/5
P(Play=No) = 5/14
P(Yes|x) [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) =
0.0053
P(No|x) [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) =
0.0206
15
16
Nave Bayes
Algorithm: Continuous-valued Features
( X j ji )
1
exp
2
2 ji
2
ji
17
Nave Bayes
Example: Continuous-valued Features
N
1 N
1
xn , 2 ( xn ) 2
N n1
N n1
(
x
21
.
64
)
1
(
x
21
.
64
)
P( x | Yes )
exp
exp
2
2.35 2
2 2.35
2.35 2
11.09
2
2
1
(
x
23
.
88
)
1
(
x
23
.
88
)
P( x | No)
exp
exp
2
2 7.09
50.25
7.09 2
7.09 2
18
Relevant Issues
Violation of Independence Assumption
If no example contains
feature
value
P ( x1 |cthe
)
P
(
a
|
c
)
P
( xn |ci ) 0
i
jk i
In this circumstance,
during test
nc mp
For a remedy, Pconditional
re-estimated with
( X a |C c probabilities
)
j
i
jk
nm
nc : number of training examples for which X j a jk and C ci
n : number of training examples for which C ci
19
Summary
Nave Bayes: the conditional independence assumption
20