You are on page 1of 35

Basics on Probability

Jingrui He
09/11/2007
Coin Flips
 You flip a coin
 Head with probability 0.5

 You flip 100 coins


 How many heads would you expect
Coin Flips cont.
 You flip a coin
 Head with probability p
 Binary random variable
 Bernoulli trial with success probability p
 You flip k coins
 How many heads would you expect
 Number of heads X: discrete random variable
 Binomial distribution with parameters k and p
Discrete Random Variables
 Random variables (RVs) which may take on
only a countable number of distinct values
 E.g. the total number of heads X you get if you
flip 100 coins

 X is a RV with arity k if it can take on exactly


one value out of { x1 ,K , xk }
 E.g. the possible values that X can take on are 0,
1, 2,…, 100
Probability of Discrete RV
 Probability mass function (pmf): P ( X = xi )
 Easy facts about pmf
 ∑ P (X = x ) = 1
i i

( )
 P X = xi ∩ X = x j = 0 if i ≠ j

( ) ( )
 P X = x ∪X = x = P X = x +P X = x
i j i (
j ) if i ≠ j
( )
 P X = x1 ∪ X = x2 ∪ K ∪ X = xk = 1
Common Distributions
 Uniform X U [1,K , N ]
 X takes values 1, 2, …, N
( )
 P X = i =1 N
 E.g. picking balls of different colors from a box
 Binomial X Bin ( n, p )
 X takes values 0, 1, …, n
n i n −i
 P ( X = i ) =   p (1 − p )
i
 E.g. coin flips
Coin Flips of Two Persons
 Your friend and you both flip coins
 Head with probability 0.5
 You flip 50 times; your friend flip 100 times
 How many heads will both of you get
Joint Distribution
 Given two discrete RVs X and Y, their joint
distribution is the distribution of X and Y
together
 E.g. P(You get 21 heads AND you friend get 70
heads)

∑∑ x y
P (X = x ∩ Y = y) = 1
 E.g.
∑ ∑ P ( You get i heads AND your friend get j heads ) = 1
50 100
i =0 j =0
Conditional Probability
 P ( X = x Y = y ) is the probability of X = x ,
given the occurrence of Y = y
 E.g. you get 0 heads, given that your friend gets
61 heads
P (X = x ∩ Y = y)
 P (X = x Y = y) =
P(Y = y)
Law of Total Probability
 Given two discrete RVs X and Y, which take
values in { x1 ,K , xm } and { y1 ,K , yn } , We have

P ( X = xi ) =∑ P (X = x ∩ Y = y )
j i j

= ∑ P ( X = x Y = y )P ( Y = y )
i j j
j
Marginalization

Marginal Probability Joint Probability

P ( X = xi ) = ∑ P (X = x ∩ Y = y )
j i j

= ∑ P ( X = x Y = y )P ( Y = y )
i j j
j

Conditional Probability Marginal Probability


Bayes Rule
 X and Y are discrete RVs…
P(X = x ∩ Y = y)
P (X = x Y = y) =
P (Y = y)

( )
P Y = y j X = xi P ( X = xi )
( )
P X = xi Y = y j =
∑ P (Y = y
k j )
X = xk P ( X = xk )
Independent RVs
 Intuition: X and Y are independent means that
X = x neither makes it more or less probable
that Y = y
 Definition: X and Y are independent iff
P (X = x ∩ Y = y) = P (X = x) P (Y = y)
More on Independence
 P (X = x ∩ Y = y) = P (X = x) P (Y = y)

P (X = x Y = y) = P (X = x) P (Y = y X = x) = P (Y = y)

 E.g. no matter how many heads you get, your


friend will not be affected, and vice versa
Conditionally Independent RVs
 Intuition: X and Y are conditionally
independent given Z means that once Z is
known, the value of X does not add any
additional information about Y
 Definition: X and Y are conditionally
independent given Z iff

P (X = x ∩ Y = y Z = z) = P (X = x Z = z) P (Y = y Z = z)
More on Conditional Independence

P (X = x ∩ Y = y Z = z) = P (X = x Z = z) P (Y = y Z = z)

P ( X = x Y = y, Z = z ) = P ( X = x Z = z )

P ( Y = y X = x, Z = z ) = P ( Y = y Z = z )
Monty Hall Problem
 You're given the choice of three doors: Behind one
door is a car; behind the others, goats.
 You pick a door, say No. 1
 The host, who knows what's behind the doors, opens
another door, say No. 3, which has a goat.
 Do you want to pick door No. 2 instead?
Host reveals
Goat A
or
Host reveals
Goat B

Host must
reveal Goat B

Host must
reveal Goat A
Monty Hall Problem: Bayes Rule
 Ci : the car is behind door i, i = 1, 2, 3
 P ( Ci ) = 1 3
 H ij : the host opens door j after you pick door i

0 i= j
0 j=k
 (
P H ij Ck ) =

i=k
1 2
 1 i ≠ k , j ≠ k
Monty Hall Problem: Bayes Rule cont.
 WLOG, i=1, j=3

P ( H13 C1 ) P ( C 1 )
 P ( C1 H13 ) =
P ( H13 )

P ( H13 C1 ) P ( C1 ) = ⋅ =
1 1 1

2 3 6
Monty Hall Problem: Bayes Rule cont.
 P ( H13 ) = P ( H13 , C1 ) + P ( H13 , C2 ) + P ( H13 , C3 )
= P ( H13 C1 ) P ( C1 ) + P ( H13 C2 ) P ( C2 )
1 1
= + 1⋅
6 3
1
=
2
P ( C1 H13 ) =
16 1
 =
12 3
Monty Hall Problem: Bayes Rule cont.
P ( C1 H13 ) =
16 1
 =
12 3
P ( C2 H13 ) = 1 − = > P ( C1 H13 )

1 2
3 3
 You should switch!
Continuous Random Variables
 What if X is continuous?
 Probability density function (pdf) instead of
probability mass function (pmf)
 A pdf is any function f ( x ) that describes the
probability density in terms of the input
variable x.
PDF
 Properties of pdf
 f ( x ) ≥ 0, ∀x
+∞

∫ f ( x) = 1
−∞
( )
 f x ≤ 1 ???

 Actual probability can be obtained by taking


the integral of pdf
 E.g. the probability of X being between 0 and 1 is
1
P ( 0 ≤ X ≤ 1) = ∫ f ( x )dx
0
Cumulative Distribution Function
 FX ( v ) = P ( X ≤ v )
 Discrete RVs
 FX ( v ) = ∑ vi
P ( X = vi )
 Continuous RVs
v
( )
 FX v =

−∞
f ( x ) dx
d
 FX ( x ) = f ( x )
dx
Common Distributions
 Normal X N µ ,σ ( 2
)
 (
 ) 
2
1 x − µ
( )
 f x = exp − , x ∈
2π σ  2σ 2

 E.g. the height of the entire population
0.4

0.35

0.3

0.25
f(x)

0.2

0.15

0.1

0.05

0
-5 -4 -3 -2 -1 0 1 2 3 4 5
x
Common Distributions cont.
 Beta X Beta (α , β )
β −1
x (1 − x ) , x ∈ [ 0,1]
1
 f ( x; α , β ) =
α −1
B (α , β )
 α = β = 1 : uniform distribution between 0 and 1
 E.g. the conjugate prior for the parameter p in
Binomial distribution
1.6

1.4

1.2

1
f(x)

0.8

0.6

0.4

0.2

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
Joint Distribution
 Given two continuous RVs X and Y, the joint
pdf can be written as f X,Y ( x, y )


∫∫ f X,Y ( x, y )dxdy = 1
x y
Multivariate Normal
 Generalization to higher dimensions of the
one-dimensional normal
Covariance Matrix

1
 f Xv ( x1 ,K , xd ) =
( 2π ) d 2
Σ
12

 1 v 
⋅ exp − ( x − µ ) Σ ( x − µ ) 
T −1 v

 2 
Mean
Moments
 Mean (Expectation): µ = E ( X )
 ∑ v P (X = v )
Discrete RVs: E ( X ) =
vi i i
+∞
 Continuous RVs: E ( X ) = ∫ xf ( x ) dx
−∞
Variance: V ( X ) = E ( X − µ )
2

Discrete RVs: V ( X ) = ∑ ( vi − µ ) P ( X = vi )
2

vi
+∞
Continuous RVs: V ( X ) = ( x − µ) f ( x )dx

2

−∞
Properties of Moments
 Mean
( ) ( )
 E X+Y = E X +E Y ( )
 E ( aX ) = aE ( X )
 If X and Y are independent, E ( XY ) = E ( X ) ⋅ E ( Y )
 Variance
( )
 V aX + b = a V X
2
( )
 If X and Y are independent, V ( X + Y ) = V (X) + V (Y)
Moments of Common Distributions
 Uniform X U [1,K , N ]
 Mean (1 + N ) 2 ; variance ( N − 1) 12
2

 Binomial X Bin ( n, p )
 Mean np ; variance np 2
 Normal X (
N µ ,σ 2 )
 Mean µ ; variance σ 2
 Beta X Beta (α , β )
αβ
Mean α (α + β ) ; variance
(α + β ) (α + β + 1)
 2
Probability of Events
 X denotes an event that could possibly happen
 E.g. X=“you will fail in this course”
 P(X) denotes the likelihood that X happens,
or X=true
 What’s the probability that you will fail in this
course?
 Ω denotes the entire event set
{
 Ω = X, X }
The Axioms of Probabilities
 0 <= P(X) <= 1
 P (Ω) = 1
 P ( X1 ∪ X 2 ∪ K) = ∑ P ( X ) , where X
i i i are
disjoint events
 Useful rules
( ) ( ) ( )
 P X1 ∪ X 2 = P X1 + P X 2 − P X1 ∩ X 2 ( )
 P (X) = 1− P (X)
Interpreting the Axioms

X1
X2

You might also like