You are on page 1of 20

Nave Bayes Classifier

Ke Chen

COMP24111 Machine Learning

Outline
Background
Probability Basics
Probabilistic Classification
Nave Bayes
Principle and Algorithms
Example: Play Tennis

Relevant Issues
Summary
COMP24111 Machine Learning

Background
There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM

b) Model the probability of class memberships given input data


Example: perceptron with the cross-entropy cost

c) Make a probabilistic model of data within each class


Examples: naive Bayes, model based classifiers

a) and b) are examples of discriminative classification


c) is an example of generative classification
b) and c) are both examples of probabilistic classification

COMP24111 Machine Learning

Probability Basics
Prior, conditional and joint probability for random
variables
P(X )

Prior probability:

Conditional probability:
X ( X1 , X2 ), P( X ) P(X1 ,X2 )

P(X1 ,X2 ) P( X2 |X1 )P( X1 ) P( X1 |X2 )P( X2 )


Joint probability:

Relationship: P( X2 |X1 ) P( X2 ), P( X1 |X2 ) P( X1 ), P(X1 ,X2 ) P( X1 )P( X2 )

Independence:

P( X1 |X2 ), P(X2 |X1 )

Bayesian Rule
Likelihood Prior
P( X |C )P(C )
P(C |X )
Posterior
P( X )
Evidence

COMP24111 Machine Learning

Probability Basics
Quiz:

We have two six-sided dice. When they are tolled, it could


end up with the following occurance: (A) dice 1 lands on side 3,
(B) dice 2 lands on side 1, and (C) Two dice sum to eight. Answer
the following questions:

1) P( A) ?
2) P(B) ?

3) P(C) ?
4) P( A |B) ?
5) P(C |A) ?
6) P( A , B) ?
7) P( A , C ) ?
8) Is P( A , C ) equal to P(A) P(C) ?
5
COMP24111 Machine Learning

Probabilistic Classification
Establishing a probabilistic model for classification

Discriminative model

P(C |X ) C c1 , , c L , X (X1 , , Xn )
P(c1 |x) P(c 2 |x)

P(c L |x )

Discriminative
Probabilistic Classifier

x1

x2

xn

x ( x1 , x2 , , xn )
COMP24111 Machine Learning

Probabilistic Classification
Establishing a probabilistic model for classification
(cont.)

Generative
P( Xmodel
|C ) C c1 , , c L , X (X1 , , Xn )
P( x |c1 )

P( x |c 2 )

P( x |c L )

Generative
Probabilistic Model

Generative
Probabilistic Model

Generative
Probabilistic Model

for Class 1

for Class 2

x1

x2

xn x1

x2

for Class L

xn

x1

x2

xn

x ( x1 , x2 , , xn )
COMP24111 Machine Learning

Probabilistic Classification
MAP classification rule
MAP: Maximum A Posterior
Assign x to c* if
P(C c * |X x ) P(C c |X x ) c c * , c c1 , , c L

Generative classification with the MAP rule


Apply Bayesian rule to convert them into posterior
probabilities
P( X x |C ci )P(C ci )
P(C ci |X x)
P( X x)
P( X x |C ci )P(C ci )
for i 1,2 , , L
Then apply the MAP rule
COMP24111 Machine Learning

Nave Bayes
Bayes classification
P(C |X ) P( X |C )P(C ) P( X1 , , Xn |C )P(C )
Difficulty: learning the joint probability
P( X1 , , Xn |C )

Nave Bayes classification

Assumption that all input features are conditionally


P(independent!
X1 , X2 , , Xn |C ) P( X1 |X2 , , Xn , C )P( X2 , , Xn |C )

P( X1 |C )P( X2 , , Xn |C )

P( X1 |C )P( X2 |C ) P( Xn |C )
x ( x1 , x2 , , xn )

*
[ PMAP
( x1 |c *classification
) P( xn |c * )]P( crule:
) [ P(for
x1 |c ) P( xn |c )]P(c ), c c * , c c1 , , c L

COMP24111 Machine Learning

Nave Bayes
For each target value of ci (ci c1 , , cL )
P (C c ) estimate P (C c ) with examples in S;
i

For every feature value x jk of each feature X j ( j 1, , F ; k 1, , N j )


P ( X j x jk | C ci ) estimate P( X j x jk | C ci ) with examples in S;

X ( a1 , , an )
[ P ( a1 |c * ) P ( an |c * )]P ( c * ) [ P ( a1 |c) P ( an |c)]P (c ), c c * , c c1 , , c L
10
COMP24111 Machine Learning

Example
Example: Play Tennis

11
COMP24111 Machine Learning

Example
Learning Phase
Outlook

Play=Yes Play=No

Sunny
Overcast
Rain

Humidity

2/9
4/9
3/9

Temperature

Play=Yes

Play=No

Hot

2/9
4/9
3/9

2/5
2/5
1/5

3/5
0/5
2/5

Play=Yes Play=No

Mild
Cool

Wind

Play=Yes

Play=No

High

3/9

4/5

Strong

3/9

3/5

Normal

6/9

1/5

Weak

6/9

2/5

P(Play=Yes) = 9/14

P(Play=No) = 5/14

COMP24111 Machine Learning

12

Example of the Nave Bayes


Classifier
The weather data, with counts and probabilities
outlook

temperature

yes

no

sunny

overcast

rainy

humidity

yes

no

hot

mild

cool

sunny

2/9

3/5

hot

2/9

2/5 high

3/9

overcast

4/9

0/5

mild

4/9

2/5 normal

6/9

rainy

3/9

2/5

cool

3/9

1/5

windy

yes

no

high

normal

play

yes

no

yes

no

false

true

4/5 false

6/9

2/5

9/14

5/14

1/5 true

3/9

3/5

A new day
outlook

temperature

humidity

windy

play

sunny

cool

high

true

Likelihood of yes

2 3 3 3 9
0.0053
9 9 9 9 14
Likelihood of no
3 1 4 3 5
0.0206
5 5 5 5 14
Therefore, the prediction is No

Example
Test Phase

Given a new instance, predict its label


x=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Look up tables achieved in the learning phrase
P(Outlook=Sunny|Play=Yes) = 2/9

P(Outlook=Sunny|Play=No) = 3/5

P(Wind=Strong|Play=Yes) = 3/9

P(Wind=Strong|Play=No) = 3/5

P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5


P(Huminity=High|Play=No) = 4/5
P(Huminity=High|Play=Yes) = 3/9
P(Play=Yes) = 9/14

P(Play=No) = 5/14

Decision making with the MAP rule

P(Yes|x) [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) =
0.0053
P(No|x) [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) =
0.0206

Given the fact P(Yes|x) < P(No|x), we label x to be No.


COMP24111 Machine Learning

15

COMP24111 Machine Learning

16

Nave Bayes
Algorithm: Continuous-valued Features

Numberless values for a feature

Conditional probability often modeled with the normal


distribution
2

( X j ji )
1

exp
2

2 ji
2

ji

ji : mean (avearage) of feature values X j of examples for which C ci


P ( X j |C ci )

ji : standard deviation of feature values X j of examples for which C ci

Learning Phase: for X ( X1 , , Xn ), C c1 , , c L


Output: n Lnormal distributions and P(C c ) i 1, , L
i
Test Phase: Given an unknown instance X ( a , , a )
1

Instead of looking-up tables, calculate conditional probabilities with all the


normal distributions achieved in the learning phrase
Apply the MAP rule to make a decision
COMP24111 Machine Learning

17

Nave Bayes
Example: Continuous-valued Features

Temperature is naturally of continuous value.


Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8
No: 27.3, 30.1, 17.4, 29.5, 15.1

Estimate mean and variance for each class

Yes 21.64 , Yes 2.35


No 23.88 , No 7.09

N
1 N
1
xn , 2 ( xn ) 2
N n1
N n1

Learning Phase: output two Gaussian models for


2
2
P(temp|C)1

(
x

21
.
64
)
1
(
x

21
.
64
)

P( x | Yes )
exp
exp
2

2.35 2

2 2.35

2.35 2

11.09

2
2

1
(
x

23
.
88
)
1
(
x

23
.
88
)

P( x | No)
exp
exp
2
2 7.09
50.25
7.09 2
7.09 2

COMP24111 Machine Learning

18

Relevant Issues
Violation of Independence Assumption

For many real world tasks,


P( X1 , , Xn |C ) P( X1 |C ) P( Xn |C )

Nevertheless, nave Bayes works surprisingly well


anyway!

Zero conditional probability Problem


X j a jk , P ( X j a jk |C ci ) 0

If no example contains
feature
value
P ( x1 |cthe
)

P
(
a
|
c
)

P
( xn |ci ) 0
i
jk i
In this circumstance,

during test

nc mp
For a remedy, Pconditional
re-estimated with
( X a |C c probabilities
)

j
i
jk
nm
nc : number of training examples for which X j a jk and C ci
n : number of training examples for which C ci

p : prior estimate (usually, p 1 /t for t possible values of X j )

m : weight to prior (number of " virtual" examples, m 1)


COMP24111 Machine Learning

19

Summary
Nave Bayes: the conditional independence assumption

Training is very easy and fast; just requiring considering each


attribute in each class separately

Test is straightforward; just looking up tables or calculating


conditional probabilities with estimated distributions

A popular generative model

Performance competitive to most of state-of-the-art classifiers


even in presence of violating independence assumption

Many successful applications, e.g., spam mail filtering

A good candidate of a base learner in ensemble learning

Apart from classification, nave Bayes can do more

COMP24111 Machine Learning

20

You might also like