You are on page 1of 40

1

Stat E100


Unit 4: Probability
IPS Chapter 4
2
Unit 4 Outline: Probability
Definitions and Concepts in Probability
Rules of Probability
Independence
Conditional Probability
Introduction to Random Variables
Discrete vs. Continuous
Means (expected Values), Variances, and Correlations


3
Probability Terminology
Terminology
random phenomenon an event whose individual outcomes are
uncertain but there is a regular distribution in a large number of
repetitions.
Examples:
Coin tossing and dice rolling
The lottery and other games of chance
Drawing a random sample from some population

outcome: the value of one replication of a random experiment or
phenomenon,
Coin Tossing:
H with one toss of a coin
HTT with three tosses


4
Probability Terminology (cont.)
sample space (labeled S): is the set of all possible outcomes of a
random phenomenon
Examples:
1. Toss a coin three times: S = {HHH,THH,HTH,,TTT}

2. Face showing when rolling a six-sided die: S = {1,2,3,4,5,6}

3. Pick a real number between 1 and 20: S ={[1,20]}

event (labeled A): a set of outcomes of a random phenomenon.
Examples:
1. The event A that exactly two heads are obtained when a coin
is tossed three times: A={HHT,HTH,THH}

2. The result of the toss of a fair die is an even number:
A = {2,4,6}

3. The number chosen from the set of all real numbers between
1 and 20 is at most 8.23: A = {[1,8.23]}
5
Simple Probability
The probability of an event can be thought of as long run
proportion/frequency
For a random phenomenon, if the sample space is finite and if
all of the individual outcomes have the same probability,
then the probability of an event A (written P(A)) is the ratio

Use this formula to determine the probability of getting two
heads in three tosses of a coin? Probability of getting an even
number in one roll of a die?
S
A
A P
in elements of #
in elements of #
) ( =
6
Events in Sample Spaces
(more Terminology)
The union of two events A and B is the event that either A occurs or B
occurs or both occur:


The intersection of two events A and B is the event that both A and B
occur.
Set theory notation (the and U) is widely used, but not
needed in this class
The complement of an event A, A
c
, is the event that A does not occur
and thus consists of outcomes that are not in A. The text just calls it
not A
B A B or A = ) (
AB B A B and A = = ) (
7
Rules of Probability
Rule 1: The probability of any event A satisfies
0 s P(A) s 1

Rule 2: P(S) = 1. The probability of the sample space S is 1

Rule 3: P(A
c
) = 1 P(A). The probability of an event not
happening is 1 (Probability of event happening)

Rule 4: If A and B are disjoint events then
P(A or B) = P(A) + P(B).
Note: the general rule is P(A or B) = P(A) + P(B) P(A and B)

Justification for these can be seen in Venn Diagrams
8
S
Venn Diagram
Rule 3. For any event A, P(A
c
) = 1 - P(A)
9
S
Venn Diagram
Rule 4. If A and B are disjoint events then
P(A or B) = P(A) + P(B)
A and B are
disjoint
(mutually
exclusive)
-No outcomes in
common -
Cannot happen
at same time
10
S
Venn Diagram
In general for any events A and B:
P(A or B) = P(A) + P(B) P(A and B)
11
Independent events
There is a 5
th
`rule for `independent pairs of events
Motivation on next slide, using coin tossing
Rule 5: Two events A and B are independent if and only if
knowing that one event occurs does not change the probability
that the other event occurs. If A and B are independent, then:
P(A and B) = P(A)P(B)
Sometimes called the multiplication rule for independent
events.
Does knowing the results of flipping a fair coin once affect the
chances of heads on a 2
nd
flip?
Note: Independence cant be easily drawn in a Venn Diagram
12
An Example
There is a bag with 3 balls in it: 1 is red, and 2 are black
You draw two balls out of the bag, one at a time (without
replacement). Define the events:
A: the first ball drawn is black
B: the second ball drawn is black

Are A and B independent?
13
Unit 4 Outline: Probability
Definitions and Concepts in Probability
Rules of Probability
Independence
Conditional Probability
Introduction to Random Variables
Discrete vs. Continuous
Means (expected Values), Variances, and Correlations


14
Conditional probability
conditional probability: the probability of one event occurring
under the condition that we know the outcome of another event

Let A and B be two events in a sample space, with P(A) > 0. The
conditional probability of event B, given that A has occurred,
written P(B|A), is




P(B|A) read as probability of B, given A has happened, or
probability of B if A is true.
Note that if A and B are independent, P(B|A) = P(B)
Conditional probability can get tricky!!!
) (
) and (
) | (
A P
A B P
A B P =
A: the first ball drawn is black
B: the second ball drawn is black
Simple Example
2
1
6 / 4
6 / 2
) (
) (
) (
) B and (
) | ( = = = =
Black is Ball First P
Black are Balls Both P
A P
A P
A B P
First Ball
Second Ball
There is a bag with 3 balls in it: 1 is red, and 2 are black
You draw two balls out of the bag, one at a time (without
replacement). Define the events:
16
Very tricky
The Monty Hall Problem
There are prizes behind 3 doors: two are worthless (an
ant farm) and one is expensive (like a new car)
You are asked to choose one of the 3 doors
Then, Monty Hall (from Lets Make a Deal) opens one
of the other 2 doors and shows you a worthless prize

Should you switch doors?

NYTimes take:
http://www.nytimes.com/2008/04/08/science/08monty.html
17
A general multiplication rule
(from conditional probability)
Suppose A and B are two events in a sample space (not
necessarily independent). Then
P(A and B) = P(B | A) P(A)
P(A and B and C) = P(C | A and B) [P(B | A) P(A)]

The first relationship is a simple algebraic rearrangement of
whats above:
) (
) and (
) | (
A P
A B P
A B P =
18
A Simple Example
It is known that approximately 20% of men and 3% of women are taller
than 6 feet in the US.

Let F = the event that someone is female and T = taller than 6 feet.

a) What is P(T | F)? What is P(T | F
C
)?

b) What is the probability that the next person walking through the door
is a woman and 6 feet tall?

c) What is the probability that the next person walking through the door
is 6 feet tall?
19
Simple Example (cont.)
c) What is the probability that the next person walking through the
door is 6 feet tall?

Two ways for this to happen: (T and F) or (T

and F
c
) [Think Venn Diagrams]

) and ( ) and ( ) (
C
F T P F T P T P + =
) ( ) | ( ) ( ) | (
C C
F P F T P F P F T P + =
115 . 0 50 . 0 20 . 0 50 . 0 03 . 0 = + =
2-way tables can help organize your thinking
Tall (6' or more)
Yes No
Female
Yes
No
P(F)*P(T | F) P(F)*P(not T | F)
= (0.5)*(0.97)
= 0.485
P(not F)*P(T | not F)
= (0.5)*(0.20)
= 0.100
P(not F)*P(not T | not F)
= (0.5)*(0.80)
= 0.400
= (0.5)*(0.03)
= 0.015
P(F and T)
21
Bayes Rule
Bayess rule (formula) provides a way to go from P(B | A) to
P(A | B) (they are in general not equal)
If A and B are two events whose probabilities are
not 0 or 1
) ( ) | ( ) ( ) | (
) ( ) | (
) (
) ( ) | (
) | (
C C
A P A B P A P A B P
A P A B P
B P
A P A B P
B A P
+
= =
22
Conditional probability and Bayes Rule
d) What is the probability that a randomly selected 6
foot tall person is female?
Or, you can just use the 2x2 table from 2 slide earlier.
130 . 0
115 . 0
015 . 0
50 . 0 20 . 0 50 . 0 03 . 0
50 . 0 03 . 0
) ( ) | ( ) ( ) | (
) ( ) | (
) (
) and (
) | (
= =
+

=
+
= =
C C
F P F T P F P F T P
F P F T P
T P
T F P
T F P
23
Unit 4 Outline: Probability
Definitions and Concepts in Probability
Rules of Probability
Independence
Conditional Probability
Introduction to Random Variables
Discrete vs. Continuous
Means (expected Values), Variances, and Correlations



24
Random Variables
A random variable is a variable whose value is a
numerical outcome of a random phenomenon.
Usually denoted by capital letters X, Y or Z
Example: toss a coin three times, let X = number
of observed heads.

S = {TTT,HTT,THT,TTH,THH,HTH,HHT,HHH}


values for X = {0, 1, 2, 3}

25
Random variables versus data
There is an important distinction between random variables
(X
i
) and realized observed data (x
i
)
A random variable is theoretical and has not yet been
observed, but it has the potential to take different values with
certain probabilities
The larger the number of observations (n), the more the
histogram of the observed data x
1
, x
2
, ... x
n
resembles the
probability distribution of the (theoretical) random variable
(X
i
)
We have been using random variables all along, but we called
them `variablesresponses to day a random sample/poll,
public transit data, toddlers nutritional intake, etc
26
Discrete Random Variables
In coin tossing example, number of heads, X, is called a
discrete random variable, a variable with a finite number
of distinct values
X has a simple distribution of values with associated
probabilities. This is the theoretical analogue of the
observed frequency distribution in a set of numbers.
For X, we can summarize its distribution in a table or
histogram
27
Formal Terminology
A Discrete Random Variable X can take on values in a finite set
with k members. We denote these possible outcomes by x
1
, x
2
,
x
3
, x
4
, x
5
, ...,x
k

The Probability Distribution of X is specified by assigning
probabilities to each of the possible outcomes. The probability
distribution tells us how likely each outcome is.

The probabilities assigned to each outcome must satisfy the
following conditions:
For each x
i
, P(X = x
i
) is between 0 and 1.
That is, 0 P(X = x
i
) 1 for all possible values of x
i
The sum of the probabilities equals 1.
That is, P(X = x
1
) + P(X = x
2
) + + P(X = x
k
) = 1


28
A coin tossing random variable
Let X be the number of heads when a fair coin is tossed
four times (whoa, way more complex!?)
S = {HHHH, HHHT, HHTH, ..., TTTT}
P(HHHH) = P(HHHT) = ... = P(TTTT) = 1/16
X is a discrete random variable with 5 possible values
{0, 1, 2, 3, 4}
Its probability distribution is:
Value 0 1 2 3 4
Probability 1/16 4/16 6/16 4/16 1/16
29
Possible outcomes in 4 tosses of a coin
30
Probability distribution for X, the number of
heads when a coin is tossed 4 times
31
Defining a Discrete Random Variables
Probability Distribution
The probability distribution for a discrete random variable X,
can completely defined when depicted in any of 3 forms:
1) Tabular form (shown 3 slides earlier)
2) Graphically (last slide)
3) By a formula (example coming in next unit)

Note: When k gets large, the formula based approach is often
the only feasible choice.
32
Expected value (aka mean) of a discrete
random variable
Sometimes called average or mean value of a random variable
Motivated by X = number heads in 4 tosses of a coin: |S| = 16.
S = {HHHH,THHH,HTHH,HHTH,HHHT,TTHH,,TTTT}
Average value of X, using its population of values
1/16 x (4 + 3 + 3 + 3 + 3 + 2 + + 0), or

(4 x 1/16) + (3 x 4/16) + (2 x 6/16)
+ (1 x 4/16) + (0 x 1/16)
= 32/16 = 2.0

Can be expressed as:
E(X) =
X
= E x
i
P(X = x
i
)
= x
1
p
1
+

x
2
p
2
+ ... + x
k
p
k

Center at 2
33
Mean of a discrete random variable
If X is a discrete random variable with k possible values, its
probability distribution is:
Value x
1
x
2
x
3
... x
k
Probability p
1
p
2
p
3
... p
k

Mean (or expected value) of X, denoted
X
, is given by:

X
= E(X) =
E
(x
i
P(X = x
i
)) = x
1
p
1
+

x
2
p
2
+...+ x
k
p
k


If X represents a measure on some member of a population, then
E(X) is the population mean of this measure


34
Variance of a discrete random variable
If X is a discrete random variable with k possible values, its
probability distribution is:
Value x
1
x
2
x
3
... x
k
Probability p
1
p
2
p
3
... p
k

The variance of X, denoted o
2
X
, is given by
o
2
X
= (x
1
-
X
)
2
p
1
+ (x
2
-
X
)
2
p
2
+ ... + (x
k
-
X
)
2
p
k







This is sometimes written as:
Standard deviation for a RV is the square root of variance:

=
= =
k
i
i i
p x X E X Var
1
2 2
) ( ] ) [( ) (
] ) [( ) ( ) (
2
= = X E X Var X SD
35
Correlation between 2 Discrete RVs
Recall that the correlation for two variables, when we have
observations on the variables, is
The definition of correlation for two random variables X and Y is:
y
i
n
i
x
i
s
y y
s
x x
n
r
) ( ) (
1
1
1

=

=
(


=
Y
Y
X
X
xy
Y X
E
o

o

) ( ) (
**Well never have you calculate this by hand
36
Correlation () of Random Variables
Correlation for random variables has the same propertie and
interpretation as with data

XY
> 0 means that when X tends to be larger than its mean,
Y will tend to be larger than its mean

XY
< 0 when X >
X
, Y tends to be smaller than its mean
-1
XY
1
The use of correlation in applications is more important than
formula/definition of


37
Continuous random variables
A continuous random variable is a variable taking all
possible values in an interval of numbers
If X is a continuous random variable, its probability
distribution is described by a density curve (defined
graphically or as a formula)
The probability of an event is the area under the
curve above the values of X that make up the event
What density curve have we already talked about in
detail?
38
39
Rules for variances
(IPS pg. 271)

Rule 1: If X is a random variable with variance o
2
X
and a and b
are fixed constants, then
var(a + bX) = b
2
o
2
X

Rule 2: If X and Y have correlation then:
var(X + Y) = o
2
X
+ o
2
Y
+ 2 o
X
o
Y

var(X Y) = o
2
X
+ o
2
Y
- 2 o
X
o
Y

var(aX bY) = a
2
o
2
X
+ b
2
o
2
Y
2 (a)(b)o
X
o
Y


If X and Y are not correlated, = 0 and the above simplifies to:
var(X + Y) = var(X Y) = o
2
X
+ o
2
Y

var(aX bY) = a
2
o
2
X
+ b
2
o
2
Y

40
Example linear combo of random variables
Yearly vet costs for cats have an average of $200 and a sd of $100.
For dogs, these costs average $250 with a standard deviation of $150.
You can expect these costs to be independent.

a) What is the expected total yearly vet cost for someone who owns
one cat and one dog?

Mean(1X + 1Y) = 1(
X
) + 1(
Y
) = 200 + 250 = $450


b) What is the standard deviation of the total yearly vet cost for
someone who owns one cat and one dog?

Var(1X + 1Y) = 1
2
o
2
X
+ 1
2
o
2
Y
+ 2 (1)(1)o
X
o
Y


= 1
2
(100)
2

+ 1
2
(150)
2
+ 2(0)(1)(1)(100)(150)
= 32500
SD(X + Y ) = 32500 = $180.28

You might also like