Introduction To Probability

INTRODUCTION
TO
PROBABILITY
William J. Anderson
McGill University
Contents
1 Introduction and Definitions.
1.1 Basic Definitions. . . . . . . . . . . . . . . . . . .
1.2 Permutations and Combinations. . . . . . . . .
1.3 Conditional Probability and Independence. .
1.4 Bayes Rule and the Law of Total Probability.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
11
15
20
2 Discrete Random Variables.

2.1 Basic Definitions. . . . . . . . . . . . . . . . .
2.2 Special Discrete Distributions. . . . . . . .
2.2.1 The Binomial Distribution. . . . . . .
2.2.2 The Geometric Distribution. . . . . .
2.2.3 The Negative Binomial Distribution.
2.2.4 The Hypergeometric Distribution. .
2.2.5 The Poisson Distribution. . . . . . .
2.3 Moment Generating Functions. . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
27
27
29
30
31
31
32
.
.
.
.
.
.
.
.
.
.
35
35
37
40
40
42
43
45
49
49
50
3 Continuous Random Variables.

3.1 Distribution Functions. . . . . . . . .
3.2 Continuous Random Variables. . . .
3.3 Special Continuous Distributions. .
3.3.1 The Uniform Distribution. . .
3.3.2 The Exponential Distribution.
3.3.3 The Gamma Distribution. . .
3.3.4 The Normal Distribution. . .
3.3.5 The Beta Distribution. . . . .
3.3.6 The Cauchy Distribution. . .
3.4 Chebychevs Inequality. . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Multivariate Distributions.
4.1 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Marginal Distributions and the Expected Value of Functions of
Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Special Theorems. . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Covariance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Conditional Probability and Density Functions. . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
Random
. . . . . .
. . . . . .
. . . . . .
. . . . . .
51
51
53
54
54
59
CONTENTS
4.4
4.5
4.6
4.7
Independent Random Variables. . . . . . . . . . . . . .

The Expected Value and Variance of Linear Functions
The Multinomial Distribution. . . . . . . . . . . . . . . .
More than Two Random Variables. . . . . . . . . . . .
4.7.1 Definitions. . . . . . . . . . . . . . . . . . . . . . .
4.7.2 Marginal and Conditional Distributions. . . . .
4.7.3 Expectations and Conditional Expectations. . .
. . . . . . . . . . . . .
of Random Variables.
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
61
63
67
68
68
69
70
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
71
71
73
74
74
74
75
75
6 Law of Large Numbers and the Central Limit Theorem.

6.1 Law of Large Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 The Central Limit Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
79
80
5 Functions of Random Variables.

5.1 Functions of Continuous Random Variables. . . . . . .
5.1.1 The Univariate Case. . . . . . . . . . . . . . . . .
5.1.2 The Multivariate Case. . . . . . . . . . . . . . .
5.2 Sums of Independent Random Variables. . . . . . . . .
5.2.1 The Discrete Case. . . . . . . . . . . . . . . . . . .
5.2.2 The Jointly Continuous Case. . . . . . . . . . . .
5.3 The Moment Generating Function Method. . . . . . . .
5.3.1 A Summary of Moment Generating Functions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Introduction and Definitions.
1.1
Basic Definitions.
Definition. An experiment E is a procedure which can result in one or several outcomes. The set of all possible outcomes of an experiment is called the sample space S
(more commonly ). A generic outcome will be denoted by . An event is a subset of
the sample space. Events are usually denoted by upper case letters near the beginning
of the alphabet, like A, B, C. An event which consists of only one outcome is called a
simple (or elementary event); otherwise it is a compound event.
Examples.
(1) Toss a coin. Then S = {H, T }. A = {H} is a simple event. We can also write
A = {get a head}.
(2) Toss a die. Then S = {1, 2, 3, 4, 5, 6}. A = {2, 4, 6} is an event. We could also write
A = {get an even number}. Another event is B = {1, 3, 4, 6}. C = {6} is a simple
event.
(3) Toss two dice. Then
(1, 1)
(2, 1)
(3, 1)
S=
(4, 1)
(5, 1)
(6, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(5, 2)
(6, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(5, 3)
(6, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)
(5, 4)
(6, 4)
(1, 5)
(2, 5)
(3, 5)
(4, 5)
(5, 5)
(6, 5)
(1, 6)
(2, 6)
(3, 6)
.
(4, 6)
(5, 6)
(6, 6)
Some examples of events are

A = {(2, 3), (5, 5), (6, 4)},
B = {sum of 5} = {(1, 4), (2, 3), (3, 2), (4, 1)},
C = {(6, 2)}.
C is a simple event. A and B are compound.
5
CHAPTER 1. INTRODUCTION AND DEFINITIONS.
So far, these are all finite sample spaces.

(4) Toss a coin until you get a head. Then
S = {H, T H, T T H, T T T H, . . .}.
This sample space is infinite but countable. A sample space which is either finite or
countably infinite is called a discrete sample space.
(5) Spin a spinner.
Assuming we can measure the rest angle to any degree of accuracy, then S =
[0, 360) and is uncountably infinite. Some examples of events are
A = [15.0, 85.0],
B = (145.6678, 279.5000],
C = {45.7}.
A and B are compound and C is a simple event.
Combinations of Events.
If A, B are events, then
(1) A B is the set of outcomes that belong to A or to B, or to both,

(2) A B is the set of outcomes that belong to both A and to B.
(3) Ac (complement of A) is the set of outcomes not in A,
(4) A \ B =def A B c .
The empty event will be denoted by . Two events A and B are mutually exclusive
if A B = .
1.1. BASIC DEFINITIONS.
Terminology. If when the experiment is performed, the outcome occurs, and if A

is an event which contains , then we say A occurs. For this reason,
(1) A B is called the event that A or B occurs, or just A or B. The rationale is that
A B occurs iff the experiment results in an outcome in A B. This outcome
must be in A or in B, or in both, in which case we say A occurs, or B occurs, or
both occur.
(2) A B is called the event that A and B occur, or just A and B.
(3) Ac is called the event that A does not occur, or just not A.
(4) A \ B is the event that A occurs but not B.
(5) is called the impossible event (why?) S is called the "sure event".
More generally, if A1 , . . . , An are events in a sample space, then
(1) n
i=1 Ai is the event that at least one of the Ai s occurs,
(2) n
i=1 Ai is the event that all of the Ai s occur.
Definition. Let S be the sample space for some experiment. A probability on S is a
rule which associates with every event in S a number in [0, 1] such that
(1) P (S) = 1,
(2) if A1 , A2 , . . . is a sequence of pairwise mutually exclusive (i.e. Ai Bj = for i j)
events, then
P (Ai ).
P (i=1 Ai ) =
i=1
That is, P (A1 A2 ) = P (A1 ) + P (A2 ) + .

Definition.
If P is a probability on S, the pair (S, P ) is called a probability space.
Proposition 1.1.1. Let P be a probability on S.

(1) P () = 0,
(2) If A1 , A2 , . . . , An are pairwise mutually exclusive events, then
P (n
i=1 Ai ) =
n
X
P (Ai ).
i=1
That is, P (A1 A2 An ) = P (A1 ) + P (A2 ) + + P (An ).

Proof.

(1) We can write S = S . Taking probabilities and using the two axioms
gives
1 = P (S) = P (S) + P () + P () + = 1 + P () + P () + ,
so we must have P () = 0.
(2) We can write
A1 A2 An = A1 A2 An . . . ,
so taking probabilities gives
P (A1 A2 An ) = P (A1 A2 An )
= P (A1 ) + P (A2 ) + + P (An ) + P () + P () +
= P (A1 ) + P (A2 ) + + P (An ).
Proposition 1.1.2 (Rules for Computing Probabilities).
(1) P (Ac ) = 1 P (A).
(2) If A B, then
(a) P (B \ A) = P (B) P (A),
(b) P (A) P (B).
(3) P (A B) = P (A) + P (B) P (A B).
Proof.
(1) We have S = A Ac disjoint, so 1 = P (S) = P (A Ac ) = P (A) + P (Ac ).
(2) We have B = A (B \ A) disjoint, so P (B) = P (A) + P (B \ A).
(3) We have A B = A [B \ (A B)] disjoint, so P (A B) = P (A) + P [B \ (A B)] =
P (A) + P (B) P (A B).
A
B
Problem. Show that
P (A B C) = P (A) + P (B) + P (C) P (A B) P (A C) P (B C) + P (A B C).
Calculating Probabilities of Events in a Discrete Sample Space. In a discrete sample

space, every event A can be expressed as the union of elementary events. Axiom 2
in the definition of a probability then says that the probability of A is the sum of the
probabilities of these elementary events. Thus, if A = {a1 , . . . , an }, then we can write
A = {a1 } {a2 } {an } and so
P (A) = P (a1 ) + P (a2 ) + + P (an ).
(Note: we are writing P ({ai }) as P (ai )). Thus the probability of an event is the sum of
the probabilities of the outcomes in that event. In particular, the sum of the probabilities of all the outcomes in S must be 1.
Example. Suppose a die has probabilities

P (1) = .1, P (2) = .1, P (3) = 0.2, P (4) = .3, P (5) = .1, P (6) = .2
What is the probability of getting an even number when the die is tossed?
Solution. Let A = {2, 4, 6}. Then

P (A) = P (2) + P (4) + P (6) = .1 + .3 + .2 = .6
Definition. A finite sample space is said to be equiprobable if every outcome has the
same probability of occurring.
Proposition 1.1.3. Let A be an event in an equiprobable sample space S. Then
P (A) =
|A|
,
|S|
where |A| is the number of outcomes in A.

Proof. Suppose the probability of any one outcome in S is p. Then 1 = P (S) = |S|p, so
p = 1/|S|, so P (A) = |A|p = |A|
.
|S|
Example. Two balanced dice are rolled. What is the probability of getting a sum of
seven?
Solution. {sum of 7} = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, so P [sum of 7] = 6/36.
10
Example. A tank has three red fish and two blue fish. Two fish are chosen at random
and without replacement. What is the probability of getting
(1) red fish first and then a blue fish?
(2) both fish red?
(3) one red fish and one blue fish?
Note: without replacementmeans a first fish is chosen from the five, and then a
second fish is chosen from the remaining four. at randommeans every pair of fish
chosen in this way has the same probability.
Solution. List the sample space. Let the fish be R1 , R2 , R3 and B1 , B2 . Then
R1 R2
R R
1 3
S=
R
1 B1
R1 B2
R2 R1
R2 R3
R2 B1
R2 B2
R3 R1
R3 R2
R3 B1
R3 B2
B1 R1
B1 R2
B1 R3
B1 B2
B2 R1
B2 R2
B2 R3
B2 B1
Since fish are chosen at random, then S is equiprobable.

(1) P {red fish first, then blue fish} = P {R1 B1 , R1 B2 , R2 B1 , R2 B2 , R3 B1 , R3 B2 } = 6/20.
(2) P {both fish red} = 6/20.
(3) P {one red, one blue} = 12/20.
If part (1) of the question were not there, we could use an unordered sample space
and answer as follows:
Solution. List the sample space. Let the fish be R1 , R2 , R3 and B1 , B2 . Then
R1 , R2 R2 , R3 R3 , B1 B1 , B2
R ,R R ,B R ,B
1
3
2
1
3
2
S=
R
,
B
R
,
B
1
1
2
2
R1 , B2
Since fish are chosen at random, then S is equiprobable.

(1) P {both fish red} = P {R1 R2 , R1 R3 , R2 R3 } = 3/10.
(2) P {one red, one blue} = P {R1 B1 , R1 B2 , R2 B1 , R2 B2 , R3 B1 , R3 B2 } = 6/10.
This is called the list the sample space(or the sample point) method. It is only
possible because the numbers of fish are small. If instead, we have 30 red fish and 20
blue fish, listing the sample space would be a huge undertaking. There must be a better
way.
1.2. PERMUTATIONS AND COMBINATIONS.
1.2
11
Permutations and Combinations.
When we used the method of listing the sample space, we didnt need to know the exact
form of an event, just the number of outcomes in the event.
Basic Principle of Counting. Suppose there are two operations op1 and op2. If op1
can be done in m ways, and op2 can be done in n ways, then the combined operation
(op1,op2) can be done in mn ways.
Example. Suppose there are two types B1 and B2 of bread, and three types F1 , F2 , F3 of
filling. How many types of sandwich can be made?
Solution. Operation 1 (choose the bread) can be done in 2 ways, and operation 2
(choose the filling) in 3 ways. so the combined operation (make a sandwich) can be
done in 3 2 = 6 ways. The resulting sandwiches are
B1 F1 B2 F1
B1 F2 B2 F2
B1 F3 B2 F3
More generally, if there are k operations, of which the first can be done in m1 ways,
the second in m2 ways,. . . , and the kth in mk ways, then the combined operation can
be done in m1 m2 mk ways.
Example. A committee consisting of a president, a vice-president, and a treasurer is

to be appointed from a club consisting of 8 members. In how many ways can this
committee be formed?
Solution. We have m1 = 8, m2 = 7, and m3 = 6, so number of ways is 8 7 6 = 336.
Example. How many three letter words can be formed from the letters a,b,c,d,e if
(1) each letter can only be used once?
(2) each letter can be used more than once?
12
Solution. (1) 5 4 3 = 60 (2) 5 5 5 = 125.
Factorial Notation.
If n is an integer like 1, 2, . . ., we define

n! = 1 2 3 n.
We also define 0! = 1.
Example. 1! = 1, 2! = 2, 3! = 6, 4! = 24, etc.
Permutations.
time, is
The number of permutations (i.e. orderings) of n objects, taken r at a
Prn = n (n 1) (n 2) (n r + 1) =
n!
.
(n r )!
Example. P24 = 12, P35 = 60. Pnn = n! is the number of ways in which n objects can be
ordered.
Combinations. The number of combinations of n objects taken r at a time (i.e. number of ways you can choose r objects from n objects) is
Crn =
(Crn is also denoted by
C0n =

n
r
n!
.
r !(n r )!
.) Note that
n!
= 1,
0!n!
Cnn =
n!
= 1,
n!0!
n
= n.
Cn1
C1n = n,
Example.
C24 =
4!
= 6,
2!2!
C35 =
5!
= 10.
3!2!
Example. Suppose we have 4 objects A1 , A2 , A3 , A4 . The combinations taken two at a

time are
A1 , A2 A1 , A3 A1 , A4 A2 , A3 A2 , A4 A3 , A4
Note that if write down all the orderings of each of these combinations, we get
A1 A2
A1 A3
A1 A4
A2 A3
A2 A4
A3 A4
A2 A1
A3 A1
A4 A1
A3 A2
A4 A2
A4 A3
which are all the permutations of these 4 objects taken two at a time. That is, P24 = 2!C24 .
In general, we have Prn = r !Crn , which is how the formula for Crn is obtained.
1.2. PERMUTATIONS AND COMBINATIONS.
13
Example. In how many ways can a committee of 3 be chosen from a club of 6 people?
Solution. C36 = 20.
Example. (No. 2.166, 7th ed.) Eight tires of different brands are ranked from 1 to 8
(best to worst). In how many ways can four of the tires be chosen so that the best tire
in the sample is actually ranked third among the eight?
Solution. Identify the tires by their rankings. Among the four tires, one must be tire
3, and the other three must be chosen from tires 4, 5, 6, 7, 8. This latter can be done in

5
= 10 ways, so the answer is 10.
3
Example. A club consists of 9 people, of which 4 are men and 5 are women. In how
many ways can a committee of 5 be chosen, if it is to consist of 3 women and 2 men?
Solution. Let m1 =number of ways to choose the women=C35 , and m2 =number of
ways to choose the men= C24 . Then the number of ways to choose the committee is
C35 C24 = 10 6 = 60.
Example. A box contains 9 pieces of fruit, of which 4 are bananas and 5 are peaches.
A sample of 5 pieces of fruit is chosen at random. What is the probability this sample
will contain
(1) exactly 2 bananas and 3 peaches?
(2) no bananas?
(3) more peaches than bananas?
Solution.
(1) The
number
of ways of choosing a sample consisting of 2 bananas
and 3 peaches

4
5
9
is 2 3 . The number of ways of choosing a sample of 5 is 5 . Hence the answer
is

4
2
5
3
.
9
5
(2)

4
0
1
5 = .
9
9
5
14
(3) Let B be the number of bananas in the sample and P the number of peaches. Then

P [B < P ] = P [B = 0, P = 5]+P [B = 1, P = 4]+P [B = 2, P = 3] =
4
0
5
5

4
1
5
4

Example. Suppose we have n symbols, of which x are Ss and n x are F s. How

many different orderings are there of these n symbols?
Solution. The number of different orderings is just the number of ways we can choose
the x spaces in which to place the Ss, and this is Cxn .
Another solution. Let N be the number of such orderings. Given a single such ordering, place subscripts on the S 0 s and on the F s to distinguish among the x Ss and
among the n x F s. The subscripted Ss give rise to x! possible orderings and the
n x subscripted F s to (n x)! orderings, so permuting all the subscripted Ss and
F s among themselves give rise to x!(n x)! orderings. Then N orderings give rise to
Nx!(n x)! orderings of the subcripted symbols, which must equal n! Solving for N
gives N = Cxn .
Proposition 1.2.1. The number of ways of partitioning n distinct
objects into k distinct
Pk
groups containing n1 , n2 , . . . nk objects respectively, where i=1 ni = n is
!
def
n!
.
n1 !n2 ! nk !
Proof. Let operation 1 be to choose n1 objects for the first group,.. . ,operation k be to
choose nk objects for the kth group. Operation 1 can be done in nn1 ways. Operation

1
2 can be done in nn
ways, and so on. Then the combined operation can be done in
n2
n
n1
5
3
+ + .
9
9
9
5
n
n1 , n2 , , nk
4
2
!
!
n n1
n n1 n2 nk1
n!
=
n2
nk
n1 !n2 ! nk !
ways. The last equality comes after a little arithmetic.

Example. (No. 2.44, 7th ed.) A fleet of nine taxis is to be dispatched to three airports
in such a way that three go to airport A, five to B, and one to C.
(1) If exactly one taxi is in need of repair, how many ways can this be done so that
the taxi that needs repair goes to C?
(2) If exactly three taxis are in need of repair, how many ways can this be done so
that every airport receives one of the taxis needing repair?
1.3. CONDITIONAL PROBABILITY AND INDEPENDENCE.
15
Solution.
(1) Send
the taxi that needs repair to C. The remaining 8 taxis can be dispatched in
8
8!
= 56 ways.
= 3!5!
3,5
(2) The taxis needing
repair can be assigned in 3! ways. The remaining six taxis can
6
6!
= 15 ways. so the answer is 6 15 = 90 ways.
be assigned in 2,4
= 2!4!
Here is a second solution of the red fish, blue fish example.
(2) both fish red?
Solution. For (3), we have P [1 red,1 blue] = P [ red first, bluesecond]+P [ blue first, red second].
1.3
Conditional Probability and Independence.
Suppose a balanced die is tossed in the next room. We are told that a number less than
4 was observed. What is the probability the number was either 1 or 2?
Let
A = {1, 2},
B = {1, 2, 3}.
Then, what is the probability of A given that event B has occurred? This is denoted
by P (A|B). The answer is that if we know that B has occurred, then the sample space
reduces to S 0 = B, and so P (A|B) = two chances in three= 2/3. Now notice that
P (A|B) =
2/6
P (A B)
2
=
=
.
3
3/6
P (B)
This suggests the following definition.

Definition. If A and B are two events with P (B) > 0, then the conditional probability
of A given that B has occurred is
P (A|B) =
P (A B)
.
P (B)
The vertical slash is read as given. Note that P (A|B) P (B|A) in general (in fact, they
are equal iff P (A) = P (B)).
16
Example. Toss two balanced dice. Let A = {sum of 5} and B = {first die is 2}. Then
A B = {(1, 4), (2, 3)}, B = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3),
(2, 4), (2, 5), (2, 6)}, so
P (A|B) =
2/36
P (A B)
2
=
=
.
12
12/36
P (B)
Example. Two balanced dice are tossed. What is the probability that the first die gives
a number less than three, given that the sum is odd?
Solution. Let A = {first die less than 3} and B = {sum is odd}. Then
A B = {(1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5)},
so
P (A|B) =
The Multiplicative Rule.
P (A B)
6/36
12
=
=
.
P (B)
1/2
36
This is
P (A B) = P (A|B)P (B)
The following is a third way of doing the red fish, blue fish example.
(2) both fish red?
Solution.
P [red first and blue second] = P [{red first} {blue second}] = P (blue second|red first)P (red first)
2 3
6
.
= =
4 5
20
P [both red] = P [{red first} {red second}] = P (red second|red first)P (red first) =
2 3
6
=
.
4 5
20
P [blue first and red second] = P [{blue first} {red second}] = P (red second|blue first)P (blue first)
3 2
6
= =
.
4 5
20
Hence P [one red, one blue] = P [{red first and blue second}{blue first and red second}] =
P [{red first and blue second}] + P [{blue first and red second}] = 12/20.
17
Example. Toss an unbalanced die with probs p(1) = .1, p(2) = .1, p(3) = .3, p(4) =
.2, p(5) = .1, p(6) = .2. Let A = { 5}, B = { 2}. Since A B = A, then
P (A|B) =
P (A)
.3
P (A B)
=
=
= 1/3.
P (B)
P (B)
.9
Example. Two balanced coins were tossed, and it is known that at least one was a
head. What is the probability that both were heads?
Solution. We have
P [both|at least one] =
P [{both} {at least one}]

P [{both}]
P [{HH}]
1/4
=
=
=
.
P [{at least one}]
P [{at least one}]
P [{HT , T H, HH}]
3/4
Example. Two cards are drawn without replacement from a standard deck. Find the
probability that
(1) the second is an ace, given that the first is not an ace.
(2) the second is an ace.
(3) the first was an ace, given that the second is an ace.
Solution.
(1)
4
.
51
(2)
P [second an ace] = P [second an ace|first an ace]P [first an ace]+
P [second an ace|first not an ace]P [first not an ace]
3
4
4 48
4
=
=
.
51 52 51 52
52
(3)
P [first an ace, second an ace]
P [second an ace]
P [second an ace|first an ace]P [first an ace]
=
P [second an ace]
P [first an ace|second an ace] =
3
51
4
52
4
52
3
.
51
Example. The numbers 1 to 5 are written on five slips of paper and placed in a hat.
two slips are drawn at random without replacement. What is the probability that the
first number is 3, given a sum of seven?
18
Solution. Let A = {first a three} = {(3, 1), (3, 2), (3, 4), (3, 5)}, B = {sum of seven} =
{(2, 5), (3, 4), (4, 3), (5, 2)}. Since A B = {(3, 4)}, and the sample space has 20 outcomes, then
P [first a three|sum of seven] = P (A|B) =
1/20
1
P (A B)
=
= .
P (B)
4/20
4
Example. A card is selected at random (i.e. every card has the same probability of
being chosen) from a deck of 52. What is the probability it is a red card or a face card?
Solution. R = {red card}, F = {face card}. Then P (R F ) = P (R) + P (F ) P (R F ) =
26
6
+ 12
52
= 32
.
52
52
52
Proposition 1.3.1 (Properties of Conditional Probability). Fix an event B with P (B) > 0.
Then
(1) P (S|B) = 1, P (|B) = 0.
(2) P (B|B) = 1.
(3) P (Ac |B) = 1 P (A|B). (Ac =complement of A)
(4) P (C D|B) = P (C|B) + P (D|B) if C D = .
Proof. We have P (S|B) =
P (CD|B) =
P (SB)
P (B)
= 1. If C and D are mutually exclusive events, then
P [(C D) B]
P [(C B) (D B]
P (C B) + P (D B]
=
=
= P (C|B)+P (D|B).
P (B)
P (B)
P (B)
Remark. Fix an event B with P (B) > 0, and for any event A, define Q(A) = P (A|B).
Then Q is a probability.
Proposition 1.3.2. The following are equivalent statements:
(1) P (B|A) = P (B)
(2) P (A|B) = P (A).
(3) P (A B) = P (A)P (B).
19
Definition. Two events A and B are called independent if any one (and therefore all) of
the above conditions holds. We will actually take as our definition the third statement.
Definition.
Two events A and B are called independent if P (A B) = P (A)P (B).
Problem. Show that if A and B are independent, then so are (i) Ac and B, (ii) Ac and
Bc .
Solution. For (i), we have

P [Ac B] = P (B) P (A B) = P (B) P (A)P (B) = [1 P (A)]P (B) = P (Ac )P (B).
Example. Suppose Susan and Georges are writing the 323 exam. The probability that
Susan will pass is .70, and the probability that Georges will pass is .60. What is the
probability that (i) both will pass (ii) at least one will pass?
Solution. Let S = {Susan passes}, G = {Georges passes}. We assume S and G are

independent. Then
(i) P (both pass) = P (S G) = P (S)P (G) = .7 .6 = .42.
(ii) P (at least one passes) = P (S G) = P (S) + P (G) P (S G) = .7 + .6 .42 = .88.
Example. Suppose an unbalanced die with probs p(1) = .1, p(2) = .1, p(3) = .3, p(4) =
.2, p(5) = .1, p(6) = .2 is tossed twice. What is the probability of getting
(1) (3, 2) (i.e. a 3 on the first toss and a 2 on the second)?
(2) A sum of four?
Solution. We are working in the sample space consisting of 36 outcomes.

(1) Let A = {three on first toss}, B = {two on second toss}. Since A and B are independent, then
P [(3, 2)] = P (A B) = P (A)P (B) = p(3)p(2) = .03.
(2) P [sum of four] = P [(1, 3), (2, 2), (3, 1)] = P [(1, 3)]+P [2, 2)]+P [(3, 1)] = p(1)p(3)+
p(2)p(2) + p(3)p(1) = .03 + .01 + .03 = .07.
20
More on Independence. Three events A, B, C are independent if

(1) any two of them are independent,
(2) P (A B C) = P (A)P (B)P (C).
More generally, n events A1 , A2 . . . , An are independent if
(1) any n 1 of them are independent,
(2) P (A1 A2 An ) = P (A1 )P (A2 ) P (An ).
Example. Suppose that Bob applies for admission to 10 medical schools. Suppose his
marks are such that the probability that he will be accepted by any given one of them
is .2. What is the probability that Bob will be entering medical school next year?
Solution. Let
Fi
=
{school number i does not accept Bob},
and
let
F = {Bob will not go to medical school next year}. Then F = F1 F2 F10 . Assuming F1 , . . . , F10 are independent, then
P (F ) = P (F1 )P (F2 ) P (F10 ) = .810 = .107,
so the probability that Bob will be in medical school is P (F c ) = 1 P (F ) = .893.
1.4
Bayes Rule and the Law of Total Probability.
Definition.
The events B1 , B2 , . . . Bn form a partition of S if
(1) they are pairwise mutually exclusive (i.e. Bi Bj = if i j),

(2) n
i=1 Bi = S.
Proposition 1.4.1. Let B1 , B2 , . . . , Bn be a partition of S and let A be any event.
Pn
(1) P (A) = i=1 P (A|Bi )P (Bi ). (This is called the Law of Total Probability.)
(2)
P (A|Bk )P (Bk )
P (Bk |A) = Pn
.
i=1 P (A|Bi )P (Bi )
This is called Bayes Rule.
Proof.
(1) Since B1 , B2 , . . . , Bn form a partition of S, then A B1 , A B2 , . . . , A Bn form a
partition of A. Then
P (A) =
n
X
i=1
P (A Bi ) =
n
X
i=1
P (A|Bi )P (Bi ).
1.4. BAYES RULE AND THE LAW OF TOTAL PROBABILITY.
21
(2)
P (Bk |A) =
P (A Bk )
P (A|Bk )P (Bk )
=
,
P (A)
P (A)
and then we substitute for P (A) from part (1).
B1
B2
B3
Remark. Everything remains valid if n is replaced by . That is if we have a partition

B1 , B2 , . . . , of infinitely many sets.
Example. There are three Canadian firms which build large bridges, firm 1, firm 2,
and firm 3. 20% of Canadian large bridges have been built by firm 1, 30% by firm 2, and
the rest by firm 3. 5% of the bridges built by firm 1 have collapsed, while 10% of those
by firm 2 have collapsed, and 30% by firm 3 have collapsed.
(1) What is the probability that a bridge collapses?
(2) Suppose it is reported in tomorrows newspaper that a large bridge has collapsed.
What is the probability it was built by firm 1?
Solution. Let F 1 = {bridge built by firm 1}, F 2
F 3 = {bridge built by firm 3}, C = {collapse}. Then
{bridge built by firm 2},
P (F 1) = .2, P (F 2) = .3, P (F 3) = .5, P (C|F 1) = .05, P (C|F 2) = .1, P (C|F 3) = .3.

(1) P (C) = P (C|F 1)P (F 1) + P (C|F 2)P (F 2) + P (C|F 3)P (F 3) = (.05 .2) + (.1 .3) +
(.3 .5) = .01 + .03 + .15 = .19.
(2) By Bayes theorem, we have
P (F 1|C) =
.05 .2
.01
1
P (C|F 1)P (F 1)
=
=
=
.
P (C|F 1)P (F 1) + P (C|F 2)P (F 2) + P (C|F 3)P (F 3)
.19
.19
19
Random Sample. Suppose we have a population of N measurements, and we select a

sample of size n from it. Sampling is said to be random if every sample of size n has
the same probability of being chosen as every other. If sampling is without replacement
(the usual case), this probability would be 1/CnN .
22
Chapter 2
Discrete Random Variables.
Definition. Let S be a sample space. A random variable (rv) X on S is a function
X : S R. Let RX denote the range of X. X is called a discrete random variable if RX is
a countable set. In this chapter, we deal with discrete random variables.
2.1
Basic Definitions.
Example. Suppose a coin is tossed three times. Let X be the number of heads observed. The sample space is
HHH
- 3
HHT - 2
HT H
- 2
HT T
- 1
S=
T HH
- 2
T HT
- 1
TTH
- 1
TTT
- 0
That is, we have X(HHH) = 3, X(HHT ) = 2, X(HT H) = 2, and so on. Hence
RX = {0, 1, 2, 3}.
Definition.
Let X be a discrete rv. The function fX : RX [0, 1] defined by

fX (x) = P [X = x],
x RX
is called the probability function of X.

Let A RX . The formula
X
X
P [X A] =
P [X = x] =
fX (x)
xA
xA
is very important. (Note: [X A] is shorthand notation for X 1 (A) = { S : X()

A}.)
The basic properties of a probability function are
23
24
CHAPTER 2. DISCRETE RANDOM VARIABLES.
(1) f (x) 0 for all x,

P
(2) x f (x) = 1.
Any function with these properties will be called a probability function.
Example. Suppose the coin in the previous example is balanced. Then the sample
space is equiprobable and
P [X = 0] = P [F F F ] =
1
,
8
3
,
8
3
P [X = 2] = P [HHT , HT H, T HH] = ,
8
1
P [X = 3] = P [HHH] = .
8
P [X = 1] = P [HT T , T HT , T T H] =
This can be conveniently summarized as
Definition.
fX (x)
1
8
3
8
3
8
1
8
The expected value of a discrete rv X is defined to be

X
X
E(X) =
xfX (x) =
xP [X = x].
xRX
xRX
This is also called the expectation of X, or the mean of X. E(X) is frequently denoted
by X .
Example. For the rv X of the previous example, we have
E(X) = (0
1
3
3
1
) + (1 ) + (2 ) + (3 ) = 1.5.
8
8
8
8
Example. The constant rv X c, where c R, is discrete with RX = {c} and P [X =

c] = 1. Therefore E(X) = cP [X = c] = c. We would rather just write this as E(c) = c.
In particular, E(0) = 0 and E(1) = 1.
RX
g(X)
25
If X : S RX and g : RX R, then the composite function g(X) : S R is defined

by g(X)() = g[X()].
Proposition 2.1.1. Let X be a discrete rv, and let g : RX R. Then the composite function
g(X) is also a rv, and has expected value
X
E[g(X)] =
g(x)fX (x).
xRX
Proof. Let Y = g(X). Partition RX as RX = yRY g 1 (y). Then

X
X
X
X
X
X
g(x)fX (x) =
g(x)fX (x) =
yfX (x) =
y
xRX
yRY xg 1 (y)
yRY xg 1 (y)
yP [X g 1 (y)] =
yRY
yRY
fX (x)
xg 1 (y)
yP [Y = y] = E(Y ).
yRY
Examples.
(1) For the rv X of the previous two examples, we have
2
E(X ) =
3
X
x 2 fX (x) = (0
x=0
(2) If g(x) =
5x
,
x 2 +1
then g(X) =
1
3
3
1
) + (1 ) + (4 ) + (9 ) = 3.
8
8
8
8
5X
.
X 2 +1
Proposition 2.1.2. Let X be a discrete rv.

(1) If g1 (x) and g2 (x) are two functions defined on RX , then E[g1 (X) + g2 (X)] =
E[g1 (X)] + E[g2 (X)],
(2) If c R, then E[cg1 (X)] = cE[g1 (X)].
Proof. We have
E[g1 (X) + g2 (X)] =
[g1 (x) + g2 (x)]fX (x) =
xRX
g1 (x)fX (x) +
xRX
g2 (x)fX (x)
xRX
= E[g1 (X)] + E[g2 (X)].

P
P
Also, E[cg1 (X)] = xRX cg1 (x)fX (x) = c xRX g1 (x)fX (x) = cE[g1 (X)].
Definition.
The variance of a discrete rv X is defined to be

X
Var(X) = E[(X )2 ] =
(x )2 fX (x),
xRX
where
= X = E(X). We also denote Var(X) by X2 . The positive square root X =
p
Var(X) is called the standard deviation of X.
26
Note. If X = c is a constant r.v., we have E(X) = E(c) = c, and so Var(X) = E[(c

c)2 ] = E(0) = 0.
Example. For the rv X of the previous examples, we have
1
3
3
1
Var(X) = [(0 1.5)2 ] + [(1 1.5)2 ] + [(2 1.5)2 ] + [(3 1.5)2 ] = 0.75.
8
8
8
8
Proposition 2.1.3. Var(X) = E(X 2 ) 2 .
Proof.
Var(X) =
(x )2 fX (x) =
xRX
(x 2 2x + 2 )fX (x)
xRX
x 2 fX (x) +
xRX
(2x)fX (x) +
xRX
2 fX (x)
xRX
2
= E(X ) 2E(X) + = E(X ) .
Example. For the rv X of the previous examples, we have

E(X 2 ) 2 = 3 1.52 = 0.75.
Meaning of E(X). Suppose we have an experiment with outcomes w1 , . . . , wm , and we
get xj dollars if outcome wj occurs. Define an r.v. by X(wj ) = xj . X is our payoff when
the experiment is performed. Let pj = P [X = xj ].
Suppose the experiment is performed n times. Call each performance a trial. Suppose payoff xj occurs nj times among the n trials, so that n1 + + nm = n). Our
total payoff over the n trials will be
n1 x1 + n2 x2 + + nm xm .
The average payoff per trial will be
n1
n2
nm
n1 x1 + n2 x2 + + nm xm
=
x1 +
x2 + +
xm .
n
n
n
n
n
For large n, we have ni pi . (After all, that is how we would determine pi .) Hence for
large n, we would have
average payoff per trial p1 x1 + p2 x2 + + pm xm .
In terms of X, this is
E(X) =
m
X
xi P [X = xi ].
i=1
So we think of E(X) as the average value of X if the experiment were repeated a large
number of times.
2.2. SPECIAL DISCRETE DISTRIBUTIONS.
2.2
27
Special Discrete Distributions.
Definition. A rv X that can take only two values (usually 0 and 1 or 1 and 1) is said
to be a Bernoulli rv.
2.2.1
The Binomial Distribution.
Suppose we have an experiment with only two outcomes, S (success) and F (failure),
with probabilities p and q respectively (Note that p + q = 1). For example,
(1) toss a coin
(2) roll a balanced die. Successmight mean getting a six, and failureanything else,
so that p = 1/6 and q = 5/6.
Each time this experiment is performed, it is called a trial (specifically a Bernoulli trial,
because there are only two outcomes). The experiment is performed n times in such a
way that whatevever happens on any one trial is independent of what happens on any
other trial. This is called having n independent trials. Let
X = the number of successes observed in the n trials.
Then X has range set RX = {0, 1, 2, . . . , n}. X is called a binomial random variable. We
write X Bin(n, p).
Proposition 2.2.1. X has probability function given by
!
n x nx
P {X = x} =
p q
, x = 0, 1, 2, . . . , n,
x
where q = 1 p.
Proof. Let us look at the case n = 3. The sample space is
-
p3
SSS
- p 2 q
SSF
- p 2 q
SF S
- pq2
SF F
S=
,
F SS
- p 2 q
- pq2
F SF
- pq2
FFS
F F F -
q3
where for example
P [F F S] = P [{F on 1st trial} {F on 2nd trial} {S on 3rd trial}]
= P [{F on 1st trial}]P [{F on 2nd trial}]P [{S on 3rd trial}] = q2 p.
28
Note that the probability of an outcome depends only on the number of Ss and F s
in the outcome, not their order. So
P {X = 2} = P {SSF , SF S, F SS} = P (SSF ) + P (SF S) + P (F SS) = 3p 2 q.
More generally,
P {X = x} = (number of outcomes with x Ss and n x F s ) p x qnx = Cxn p x qnx .
Remark.
Recall that if a, b R and n = 0, 1, 2, . . . , then we have the Binomial Formula

(a + b)n =
n
X
Cxn ax bnx .
x=0
Proposition 2.2.2. Suppose X Bin(n, p). Then

E(X) = np,
Var(X) = npq.
Proof.
n
X
n
X
n!
(n 1)!
x nx
E(X) =
x
p q
= np
p x1 q(n1)(x1)
x!(n x)!
(x 1)![(n 1) (x 1)]!
x=1
x=1
= np
m
X
m!
p y qmy = np,
y![m
y]!
y=0
where in the next to last equality, we made the changes y = x 1 and m = n 1.

Similarly, we have
E[X(X 1)] =
n
X
x(x 1)
x=2
= n(n 1)p 2
= n(n 1)p
n!
p x qnx
x!(n x)!
n
X
(n 2)!
p x2 q(n2)(x2)
(x
2)![(n
2)
(x
2)]!
x=2
m
X
m!
p y qmy = n(n 1)p 2 ,
y![m
y]!
y=0
where in the next to last equality, we made the changes y = x 2 and m = n 2. Then
E(X 2 ) = E[X(X 1) + X] = E[X(X 1)] + E(X) = n(n 1)p 2 + np, so
Var(X) = E(X 2 ) 2 = n(n 1)p 2 + np n2 p 2 = npq.
There are tables in the back of the textbook which give binomial probabilities. But
they only deal with a few values of n (from 5 to 25), and p.
29
Example. Exxon has just bought a large tract of land in northern Quebec, with the
hope of finding oil. Suppose they think that the probability that a test hole will result
in oil is .2. Assume that Exxon decides to drill 7 test holes. What is the probability that
(1) Exactly 3 of the test holes will strike oil?
(2) At most 2 of the test holes will strike oil?
(3) Between 3 and 5 (including 3 and 5) of the test holes will strike oil?
What are the mean and standard deviation of the number of test holes which strike oil.
Finally, how many test holes should be dug in order that the probability of at least one
striking oil is .9?
Solution. Let X =number of test holes that strike oil. Then X Bin(n = 7, p = .2).
(1) P {X = 3} = C37 (.2)3 (.8)4 = 35 (.2)3 (.8)4 = .115
(2) P {X 2} = P {X = 0} + P {X = 1} + P {X = 2} = C07 .20 .87 + C17 .21 .86 + C27 .22 .85 =
.87 + (7 .21 .86 ) + (21 .22 .85 ) = .852
(3) P {3 X 5} = .148 (using table II in appendix)
E(X) = 7 .2 = 1.4 and Var(X) = 7 .2 .8 = 1.12. For the last question, we have to
find n so that P {X 1} = .9 or more. This is the same as P {X = 0} = .1 or less. But
P }X = 0} = .8n . Hence we have to find n so that .8n = .1 or less. Since
n
.8n
8
.167
9
.134
10
.107
11
.086
then the answer is 11.
2.2.2
The Geometric Distribution.
Suppose as in the previous subsection, we have a sequence of independent Bernoulli

trials, each of which can result in S (success) or F (failure), with probabilities p and q respectively, where 0 < p 1 and p+q = 1. The sample space is S = {S, F S, F F S, F F F S, . . .}.
Let Y be the trial on which the first S is observed. For example, we have Y (S) =
1, Y (F S) = 2, Y (F F S) = 3, . . .. Then Y is a discrete rv with RY = {1, 2, 3, . . .}.
Proposition 2.2.3. (1) Y has probability function P [Y = y] = pqy1 , y = 1, 2, . . .. We
write Y Geom(p).
(2) E(Y ) =
1
p
and Var(Y ) =
q
.
p2
30
Proof. Because the trials are independent,

P [Y = 3] = P [F F S] = P [{F on 1st trial} {F on 2nd trial} {S on 3rd trial}]
= P [{F on 1st trial}]P [{F on 2nd trial}]P [{S on 3rd trial}] = q2 p
for example. Next, for p > 0, we have
E(Y ) =
ypqy1 = p
y=1
yqy1 = p
y=1
X y
1
p
1
q =p
=
=
q y=0
q 1 q
(1 q)2
p
and Var(Y ) can be similarly done.

Proposition 2.2.4. Let Y be a rv taking values in the set N. Then Y Geom(p) iff Y has
the memoryless property
P [Y > m + n|Y > m] = P [Y > n], m, n 1.
(2.1)
P
P
Proof. Assume Y Geom(p). Since P [Y > y] = i=y+1 P [Y = i] = i=y+1 pqi1 =
P
P
p i=y qi = pqy i=0 qi = qy , then
P [Y > m + n|Y > m] =
P [Y > m + n]
qm+n
P [Y > m + n, Y > m]
=
=
= P [Y > n].
P [Y > m]
P [Y > m]
qm
For the converse, assume (2.1) holds, and let g(y) = P [Y > y]. Then g(m + n) =
g(m)g(n) for all m, n 1. This forces g(y) = g(1)y for all y 1. Putting q = g(1)
and p = 1 g(1) gives P [Y = y] = P [Y > y 1] P [Y > y] = qy1 qy = pqy1 .
2.2.3
The Negative Binomial Distribution.
Again, as in the previous subsection, we have a sequence of independent Bernoulli

trials, each of which can result in S (success) or F (failure), with probabilities p and q
respectively, where 0 p 1 and p + q = 1. This time, Y will be the trial on which
the r th S is observed, where r 1. Obviously, the geometric distribution is the special
case of the negative binomial when r = 1.

y1
Proposition 2.2.5. (1) Y has probability function P [Y = y] = r 1 p r qyr , y =
r , r + 1, . . ..
(2) E(Y ) =
r
p
and Var(Y ) =
rq
.
p2
Proof. For y r , we have

P [Y = y] = P [r 1 Ss in first y 1 trials, then S on yth trial]
= P [r 1 Ss in first y 1 trials]P [S on yth trial]
!
!
y 1 r 1 yr
y 1 r yr
=
p
q
p =
p q
r 1
r 1
The mean and variance will be derived later using moment generating functions.
31
Example. (3.92,3.93, p.123) Ten percent of the engines manufactured on an assembly

line are defective. If engines are randomly selected and tested, what is the probability
that
(1) the first nondefective engine will be found on the second trial?
(2) the third nondefective engine will be found on the fifth trial?
(3) the third nondefective engine will be found on or before the fifth trial?
Solution. Let Y = number of trials.
(1) Y Geom(p = .9). Then answer is P [Y = 2] = qp = .1 .9 = .09.

(2) Y NBin(p = .9, r = 3). Then answer is P [Y = 5] = 51
.93 .12 = 6(.9)3 (.1)2 =
31
.04374.
(3) Y NBin(p = .9, r = 3). Then answer is P [Y 5] = P [Y = 3] + P [Y = 4] + P [Y =
5] = .729 + .2187 + .04374 = .99144.
2.2.4
The Hypergeometric Distribution.
Suppose we have a box containing a total of N marbles, of which r are red and b are
black (so r , b 0 and r + b = N). A sample of size n is chosen randomly and without
replacement. Let Y be the number of red balls in the sample. Then Y has probability
function

P [Y = y] =
r
y
Nr
ny

N
n
0 y r, n y N r.
Proposition 2.2.6.
E(Y ) =
2.2.5
nr
N

and
Var(Y ) = n
r
N

N r
N

N n
.
N 1
The Poisson Distribution.
Definition.
A discrete random variable X having the probability function
x e
, x = 0, 1, 2, . . .
(2.2)
x!
is said to have the Poisson distribution with parameter > 0. We write X Poisson().
P [X = x] =
Check. We did not derive this distribution. Hence we have to check that (2.2) really is
a probability function. But obviously P [X = x] 0, and
X
x=0
So ok.
P [X = x] =
X
X
x e
x
= e
= e e = 1.
x!
x!
x=0
x=0
32
Example. If X Poisson() has P [X = 2] = 2P [X = 3], find P [X = 4].

Solution. We are given
2 e
2
3 e
= 2
, so = 32 . Then P [X = 4] =
(1.5)4 e1.5
24
= .04707.
Proposition 2.2.7. Let X Poisson(). Then

E(X) = ,
Var(X) = .
Proof. We have
X
x e
x1
E(X) =
xP [X = x] =
x
= e
= .
x!
(x 1)!
x=0
x=1
x=1
To compute Var(X), we compute E[X(X 1)] and proceed as with the binomial.
Proposition 2.2.8. Suppose X Bin(n, p). Then
P [X = x]
x e
x!
as n and p 0 in such a way that = np remains constant.
Proof. We have
!
n x
n!
x
n(n 1) (n x + 1) x
p (1 p)nx =
x (1 )nx =
(1 )n (1 )x
x
x
x!(n x)! n
n
n
x!
n
n
= 1 (1
2
x1
x
x e
1
)(1 ) (1
)
(1 )n (1 )x
n
n
n
x!
n
n
x!
since (1 n )n e and (1 n )x 1.
Remark.
Thus, for large n and small p, we can approximate the binomial probability

x e
n
x
p (1 p)nx by x!
, where = np. This approximation is considered goodif
x
np 7.
Example. X Bin(n = 20, p = .05).
x
P[X=x](exact binomial)
Poisson Approx ( = 1)
2.3
.358
.368
.378
.368
.189
.184
.059
.061
.013
.015
Moment Generating Functions.
Definition. Let X be a random variable and k an integer with k 0. Suppose that

E(|X k |) < Then the number k0 = E(X k ) is called the kth moment of X about the
origin. The number k = E[(X )k ] (where = 1 = E(X)) is called the kth moment
of X about its mean.
2.3. MOMENT GENERATING FUNCTIONS.

Definition.
, then
33
Let X be a r.v. If there exists a > 0 such that E(etX ) < for all < t <
def
MX (t) = E(etX ),
< t <
is called the moment generating function (mgf) of X. For a discrete r.v., we have
X
MX (t) =
etx fX (x).
xRX
Examples.
(1) If X = c, then MX (t) =def E(etc ) = etc .
(2) If X Bin(n, p), then
MX (t) =
n
X
!
!
n
X
n x nx
n
p q
=
(pet )x qnx = (pet + q)n .
x
x
x=0
tx
x=0
Note that this is finite for all t R.

(3) If X Geom(p) where p > 0, then
MX (t) =
etx pqx1 = pet
x=1
(qet )x1 =
x=1
pet
< if qet < 1.
1 qet
qet < 1 is equivalent to t < log q1 , so we may take = log q1 > 0 (since p > 0).
(4) X Poisson(). Then
MX (t) =
tx
x=0
e
x!
=e
X
(et )x
t
t
= e ee = e(1e ) ,
x!
x=0
which is finite for all t R.

Next, what are MGFs good for?
Proposition 2.3.1.
(n)
MX (0) = E(X n ), n = 0, 1, . . . .
Proof. MX (0) = E(e0 ) = E(1) = 1. From (2.3), we have
X
MX0 (t) =
xetx fX (x)
xRX
MX "(t) =
x 2 etx fX (x)
xRX
..
.
(n)
MX (t) =
x n etx fX (x),
xRX
from which MX0 (0) = E(X), MX "(0) = E(X 2 ), and so on.
(2.3)
34
Examples.
(1) X Bin(n, p). Then
MX0 (t) =
d
(pet + q)n = n(pet + q)n1 pet ,
dt
so E(X) = MX0 (0) = np. Next,

M"X (t) = n(pet + q)n1 pet + n(n 1)(pet + q)n2 (pet )2 ,
so E(X 2 ) = MX "(0) = np+n(n1)p 2 and Var(X) = np+n(n1)p 2 n2 p 2 = npq.
(2) X Geom(p). Then
MX0 (t) =
so E(X) = MX0 (0) =
d pet
(1 qet )pet pet (qet )
=
,
dt 1 qet
(1 qet )2
1
.
p
Example. Suppose r.v. X has probability function

x
fX (x)
.2
.3
.4
.1
Find the moment generating function of X and use it to calculate E(X) and Var(X).
Solution. MX (t) = .2 + .3et + .4e2t + .1e3t , so MX0 (t) = .3et + .8e2t + .3e3t and MX "(t) =
.3et + 1.6e2t + .9e3t . Then E(X) = MX0 (0) = 1.4, E(X 2 ) = MX "(0) = 2.8, and Var(X) =
E(X 2 ) [E(X)]2 = 2.8 1.42 = .84.
Remark.
If X has mgf MX (t), then

"
MX (t) =
=
tx
e fX (x) =
xRX
X
X
xRX
n=0
(tx)2 (tx)3
+
+
1 + tx +
2!
3!
X
(tx)n
t n E(X n )
fX (x) =
n!
n!
n=0
n=0 xRX
X
n0 t n
n!
#
fX (x)
Chapter 3
Continuous Random Variables.
3.1
Distribution Functions.
Before getting to continuous random variables, we need the concept of a distribution

function, which is valid for all types of random variables.
Definition. Let X be a random variable on a sample space S and let P be a probability
on S. The function
F (x) = P [X x], x R
is called the distribution function of X.
Example. Suppose X is the number that results when an unbalanced die having probabilities
x
fX (x)
.2
.1
.2
.1
.2
.2
is tossed. Find and plot the distribution function of X.

Solution.
.2
.3
F (x) = .5
.6
.8
if < x < 1,
if 1 x < 2,
if 2 x < 3,
if 3 x < 4,
if 4 x < 5,
if 5 x < 6,
if 6 x < +,
Here is a sample calculation.

F (3.6) = P [X 3.6] = P [X 3] = P [X = 1] + P [X = 2] + P [X = 3] = .5.
35
36
CHAPTER 3. CONTINUOUS RANDOM VARIABLES.
Proposition 3.1.1. Every distribution function F (x) has the following properties:
(1) F is nondecreasing. i.e. if x y, then F (x) F (y),
(2) F (x) 0 as x and F (x) 1 as x +,
(3) F is continuous from above (from the right). i.e. F (y) F (x) as y x.
Proof. (1) If x y, then {X x} {X y}, so P {X x} P {X y}.
Remarks.
(1) Conversely, any function F : R [0, 1] with the above three properties is called a
distribution function. It can be shown that given any distribution function F , there
exists a probability space (S, P ) and on it a rv X which has F as its distribution
function.
(2) If X is any r.v. and if a < b, we have
P [a < X b] = F (b) F (a).
This is because {X b} = {X a} {a < X b} (disjoint), so P {X b} = P {X
a} + P {a < X b}.
(3) The figure below shows the distribution function of a continuous r.v., and of a
mixed (part discrete, part continuous) r.v.
3.2. CONTINUOUS RANDOM VARIABLES.
3.2
37
Continuous Random Variables.
Definition. Let X be a r.v. with distribution function F (x). If there exists a function
f : R R such that
Zx
F (x) =
f (t)dt, x R,
(3.1)
then X is called a continuous random variable with density function f . Note that if f is
continuous, then by the fundamental theorem of calculus, we also have F 0 (x) = f (x)
for all x.
Proposition 3.2.1. f has the properties:
(1) f (x) 0 for all x R,
R +
(2) f (x)dx = 1.
Proof. By the fundamental theorem of calculus, we have f (x) = F 0 (x) 0 since F is
nondecreasing. Also,
Zx
1 = lim F (x) = lim
x+
x+
Z +
f (t)dt =
f (x)dx.
Remarks.
(1) Conversely, any function f : R R with the above two properties is called a
density function.
(2) If f is a density function, then F defined by (3.1) is a distribution function, so
there exists a r.v. X having F as its distribution function and therefore f as its
density function.
Proposition 3.2.2. Let X be a continuous r.v. with density function f .
(1) If a < b, then
Zb
P [a < X b] =
f (x)dx.
a
Note that this is the area under the graph of f between a and b.
(2) P [X = x] = 0 for every x R.
Proof.
(1) We have
Zb
P [a < X b] = F (b) F (a) =
Za
f (x)dx
Zb
f (x)dx =
f (x)dx.
a
38
(2) If > 0,
Zx
P [X = x] P [x < X x] =
f (t)dt 0 as 0,
x
implying that P [X = x] = 0.
Remark.
Because of part (2) we can say that

P [a X b] = P [a < X b] = P [a X < b] = P [a < X < b].
For example, {a X b} = {a < X b} {X = a}, so P {a X b} = P {a < X

b} + P {X = a} = P {a < X b}.
Note: Let h : R R. The integral
R +
h(x)dx is said to exist if
R +
|h(x)|dx < +.
Definition. Let X be a continuous r.v. with density function f (x). The expected value
(or mean, or expectation) of X is defined to be
Z +
E(X) =
xf (x)dx,
provided this integral exists.

Proposition 3.2.3. Let X be a continuous rv, and let g : RX R. Then the composite
function g(X) is also a rv, and has expected value
Z +
g(x)f (x)dx,
E[g(X)] =
provided this integral exists.

Proposition 3.2.4. Let X be a continuous r.v. with density f (x).
(1) If g1 (x) and g2 (x) are two functions R R, then E[g1 (X) + g2 (X)] = E[g1 (X)] +
E[g2 (X)],
(2) If c R, then E[cg1 (X)] = cE[g1 (X)].
Proof. We have
Z +
E[g1 (X) + g2 (X)] =
Z +
[g1 (x) + g2 (x)]f (x)dx =
Z +
g1 (x)f (x)dx +
= E[g1 (X)] + E[g2 (X)].

Also, E[cg1 (X)] =
R +
cg1 (x)f (x)dx = c
R +
g1 (x)f (x)dx = cE[g1 (X)].
g2 (x)f (x)dx
3.2. CONTINUOUS RANDOM VARIABLES.

Definition.
39
Let X be a continuous r.v. with density f (x). The variance of X is

2
Z +
= Var(X) = E[(X ) ] =
(x )2 f (x)dx,
where = E(X).
Once again, we have
Var(X) = E(X 2 ) 2 .
R +
R +
R +
This
is because (xR )2 f (x)dx = (x 2 2x + 2 )f (x)dx = x 2 f (x)dx
R +
+
2 xf (x)dx + 2 f (x)dx = E(X 2 ) 2 2 + 2 = E(X 2 ) 2 .
kx 2
Example. Suppose X has density function f (x) =
0
if 0 < x < 1,
otherwise.
Find
(1) k,
(2) the distribution function F (x),
(3) P [ 41 < X < 12 ],
(4) E(X),
(5) Var(X).
Solution.
(1) 1 =
R +
f (x)dx =
R0
0dx +
R1
0
kx 2 dx +
R +
1
0dx = k
R1
0
x 2 dx = k 13 , so k = 3.
Rx
(2) F (x) = f (t)dt. If x 0, then obviously F (x) = 0. If 0 < x < 1, then
R0
Rx
F (x) = 0dx + 0 3t 2 dt = x 3 . If 1 x < +, then F (x) = 1.
(3) P [ 41 < X < 12 ] = F ( 12 ) F ( 14 ) = ( 12 )3 ( 41 )3 =
(4) E(X) =
R +
xf (x)dx =
R1
0
7
.
64
x 3x 2 dx = 34 .
R +
R1
(5) E(X 2 ) = x 2 f (x)dx = 0 x 2 3x 2 dx =
3
3
( 34 )2 = 80
.
5
3
.
5
So Var(X) = E(X 2 ) [E(X)]2 =
Proposition 3.2.5. Let X be a discrete or continuous r.v., and let a and b be constants.
Then
Var(aX + b) = a2 Var(X).
40
Proof. We have (aX + b) E(aX + b) = aX + b [aE(X) + E(b)] = a[X E(X)], so

2
2
Var(aX + b) = E [(aX + b) E(aX + b)] = E [a[X E(X)]] = a2 Var(X).
For a continuous r.v. X with density f (x), the definition of moment generating
function as given in 2.3 becomes
Z +
tX
MX (t) = E(e ) =
etx f (x)dx.
(3.2)
of course, for this mgf to exist, there has to be a > 0 such that the integral exists for
all t with < t < .
In the continuous case, the mgf generates moments exactly as in the discrete case.
Proposition 3.2.6.
(n)
MX (0) = E(X n ), n = 0, 1, . . . .
Proof. MX (0) = E(e0 ) = E(1) = 1. From (3.2), we have
Z +
0
xetx f (x)dx
MX (t) =
Z
+
MX "(t) =
x 2 etx f (x)dx
..
.
(n)
MX (t)
Z +
=
x n etx f (x)dx,
from which MX0 (0) = E(X), MX "(0) = E(X 2 ), and so on.

Proposition 3.2.7 (Properties of a MGF Contd). Let X be any r.v. Then for any constants
a, b, we have
MaX+b (t) = etb MX (at).
Proof. MaX+b (t) = E[et(aX+b) ] = E[etb etaX ] = etb MX (at).
3.3
Special Continuous Distributions.
From the previous section, we know that if we specify a density function f (x), there
will exist a r.v. X having f (x) as its density function.
3.3.1
The Uniform Distribution.
Definition.
Let a, b R with a < b. A r.v.
0
1
f (x) = ba
X having density function

if < x < a,
if a x b,
if b < x < +,
is said to be uniformly distributed on [a, b]. We write X Unif[a, b].
3.3. SPECIAL CONTINUOUS DISTRIBUTIONS.
41
Proposition 3.3.1 (Properties of the Uniform Distribution). Suppose X Unif[a, b].

Then
(1) E(X) =
a+b
2
(2) Var(X) =
(the midpoint of [a, b]),
(ba)2
,
12
(3) X has distribution function
F (x) =
if x < a,
if x > b.
xa
ba
(4) if a < c < d < b, then P [c X d] =

(5) X has mgf MX (t) =
if a x b,
dc
,
ba
etb eta
.
t(ba)
Proof.
Ra
Rb
R +
Rb
1
(1) E(X) = xf (x)dx + a xf (x)dx + b xf (x)dx = a x ba
dx =
h 2 i
1
x b
1 b2 a2
b+a
=
=
.
ba
(2) E(X 2 ) =
ba
Rb
a
x2
Rb
1
ba a
xdx =
1
dx
ba
Var(X) = E(X 2 ) [E(X)2 ]
Rb 2

1
1
x 3 b
1 b3 a3
x dx = ba
= ba
ba a
3 a
3
b2 +ab+a2
(b+a)2
(ba)2
=
4 = 12 .
3
h
b2 +ab+a2
,
3
so
Rx
(3) F
(x)
=
f (t)dt. Obviously, F (x) = 0 if x < a. If a x b, then F (x) =
Rx 1
xa
a ba dt = ba . If x > 1, then F (x) = area under f between and x=1.
(4) P [c X d] = F (d) F (c) =
(5) MX (t) =
etx
a ba dx
Rb
1
ba
etx b
a
t
da
ba
ca
ba
etb eta
.
t(ba)
dc
.
ba
42
3.3.2
The Exponential Distribution.
Definition.
A r.v. Y having density function
0
g(y) = 1 y/
e
if y 0,
if y > 0,
is said to have the exponential distribution with parameter > 0. We write Y Exp().
Proposition 3.3.2 (Properties of the Exponential Distribution). Suppose Y Exp().

Then
(1) E(Y ) = ,
(2) Var(Y ) = 2 ,
(3) Y has distribution function
0
G(y) =
1 ey/
(4) Y has mgf MY (t) =
1
1t
if t <
if t
if y < 0,
if y 0,
Proof.
R
R0
R +
R + 1
(1) E(Y ) = yg(y)dy + 0 yg(y)dy = 0 y ey/ dy = 0 wew dw =
after an integration by parts.
R +
R
(2) E(Y 2 ) = 0 y 2 1 ey/ dy = 2 0 w 2 ew dw = 22 after an integration by parts.
R + ty
R + ty 1 y/
(4) MY (t) =
=
e e
dy
e g(y)dy
0
1
1 ey(1/t)

if t < ,

= (1/t) 0
= as given.
+
if t 1
R
1 +
0
ey(1/t) dy
43
Problem. Show that if Y Exp(), then n0 = n n!.

Solution. We have MY (t) =
1
1t
n
n=0 (t)
for |t| <
1
,
the uniqueness of McLaurin series expansions, we get n =
and MY (t) =
0
n
.
n!
n=0
0 tn
n
.
n!
By
Proposition 3.3.3 (Memoryless Property). If Y Exp(), then Y has the memoryless

property
P [Y > s + t|Y > s] = P [Y > t], s, t 0.
(3.3)
Conversely, if Y is a continuous r.v. having the memoryless property, then Y has an
exponential distribution.
Proof. Assume Y Exp(). Since P [Y > y] = 1 P [Y y] = ey/ , then
P [Y > s +t|Y > s] =
P [Y > s + t]
e(s+t)/
P [Y > s + t, Y > s]
=
=
= et/ = P [Y > t].
P [Y > s]
P [Y > s]
es/
For the converse, assume (3.3) holds and let h(y) = P [Y > y], y > 0. Then h(s + t) =
h(s)h(t) for all s, t 0. This is Cauchys equation and forces h(y) = eay for all y 0.
Since h(y) 1 for all y, then a < 0.
Thus the exponential distribution is the continuous analog of the geometric distribution.
3.3.3
The Gamma Distribution.
Definition.
The function
Z
() =
x 1 ex dx,
> 0,
is called the gamma function.

Proposition 3.3.4 (Properties of the Gamma Function).
0,
(1) 0 < () < + for all >
(2) (1) = 1,
(3) ( + 1) = (), > 0,
(4) (n + 1) = n!, n = 0, 1, 2, . . .,
(5) ( 12 ) = (This will be proved in the next section.).

Proof.
R1
R +
R1
R1
(1) () = 0 x 1 ex dx + 1 x 1 ex dx. But 0 x 1 ex dx 0 x 1 dx =
R + 1 x
x
e dx < + (can be proved).
1
R
(2) (1) = 0 ex dx = 1,
R
R
(3) ( + 1) = 0 x ex dx = x ex 0 0 ex x 1 dx = 0 + ().
and
44
Definition.
A r.v. X having density function
0
f (x) =
1 x 1 ex/
()
if x 0,
if x > 0,
is said to have the gamma distribution with parameters , > 0.

Gamma(, ).
Check.
We write X
We have to verify that this is really a density function. We have

Z
Z +
Z
1
1
1 x/
x 1 ex/ dx
x
e
dx =
f (x)dx =
()
()
0
0
Z
=
w 1 ew dw = 1,
() 0
where we made the substitution w = x/.
Proposition 3.3.5 (Properties of the Gamma Distribution). Suppose X Gamma(, ).

Then
(1) E(X) = ,
(2) Var(X) = 2 ,
(3) X has mgf MX (t) =
1
(1t)
if t < 1 ,
if t 1 .
Proof. From the above check, we have
R
0
x 1 ex/ dx = () , which will be useful.
45
(1)
Z +
1
E(X) =
xf (x)dx =
()
( + 1)+1
=
= .
()
Z
xx
1 x/
1
dx =
()
x (+1)1 ex/ dx
(2)
Z
Z
1
1
2
1 x/
E(X ) =
x f (x)dx =
x x
e
dx =
x (+2)1 ex/ dx
() 0
() 0
( + 2)+2
=
= ( + 1)2 = 2 + 2 2 .
()
2
Z +
Hence Var(X) = E(X 2 ) [E(X)]2 = 2 .

(3)
Z
Z
1
1
tx
1 x/
MX (t) =
e x
e
dx =
x 1 ex(1/t) dx
() 0
() 0
Z
1
1
1
0
x 1 ex/ dx (where 0 = t)
=
() 0
Note that we will have MX (t) = + if 0 0. Hence assume 0 > 0. Then

continuing on,
!
()(0 )
=
= (1 t)
MX (t) =
()
0
Remark. If X Gamma(, ) and is not an integer, then probabilities like P [a <

Rb
X b] will require numerical evaluation of integrals like a x 1 ex/ dx. If is an
integer, this integral can be done using integration by parts.
3.3.4
The Normal Distribution.
This is the most important distribution of all. The reason is the Central Limit Theorem,
which we will see in chapter 6.
Definition.

f (x) =
1 x 2
1
e 2 ( ) ,
2
< x <
(3.4)
where R and > 0 is said to have a normal (or Gaussian) distribution with parameters and We write X N(, 2 ).
When plotted, the density function looks like
46
A r.v. Z with distribution N(0, 1) is said to have the standard normal distribution.
Its density function looks like
We have to show that f in (3.4) is a density function. We have
Check.
Z +
where I =
R +
0
Z +
I =
1
f (x)dx =
2
ey
y 2 /2
2 /2
Z +
e
1
2
) dx = 1
2
2
dy
z2 /2
Z + Z +
dz
y 2 /2
dy =
dy. (We made the substitution y =
! Z
+
Z +
x
.)
y 2 /2 z2 /2
2
I,
Next,
Z + Z +
dydz =
0
e(y
(changing to polar coordinates r 2 = y 2 + z2 , = tan1 (z/y))

Z /2 Z +
e
=
0
r 2 /2
where u = r 2 /2, so
Remark.
r dr d =
2
R +
Z
re
r 2 /2
dr =
2
eu du
f (x)dx = 1.
Making the substitution x = w 2 /2, we have
1
( ) =
2
Z +
0
x 1/2 ex dx =
Z +
ew
2 /2
dw = 2
Z +
0
ew /2
dw = ,
2
an important property of the gamma function.

Proposition 3.3.6.
(1) If X N(, 2 ) and Z =
X
,
then Z N(0, 1).
2 +z 2 )/2
dydz
47
(2) If Z N(0, 1) and X = aZ + b where a 0, then X N(b, a2 ). That is, X has

density

2
1
12 xb
a
f (x) =
e
, < x <
|a| 2
Proof.
R + z 1 x 2
R z w 2 /2
X
1
dw
(1) P [Z z] = P [ z] = P [X + z] = 12 e 2 ( ) dx = 2
e
x
after the substitution w = .
(2) Similar to (1).
Proposition 3.3.7. If X N(, 2 ), then E(X) = and Var(X) = 2 .
Proof. First suppose Z N(0, 1). We have E(Z) = 0 either by odd symmetry, or
"
+ #
Z +
1
1
2 /2
2 /2
z
z

E(Z) =
ze
dz =
e
= 0.

2
2
Next, using the substitution w = z2 /2 (so dw = zdz), we obtain

Z +
Z +
Z
2
2 2 + 1 w
1
2 /2
2 /2
2
z
2
z
2
z e
dz =
z e
dz =
w 2 e dw
E(Z ) =
2
2 0
2 2 0
2
3
= ( ) = 1.
2
(since ( + 1) = () and ( 12 ) = ).
X
Now let X N(, 2 ). Define Z = , so Z N(0, 1). Since conversely, X = Z +,
then E(X) = E( Z) + E() = E(Z) + = . Also, E(X 2 ) = E[ 2 Z 2 + 2 Z + 2 ] =
2 E(Z 2 ) + 2 E(Z) + E( 2 ) = 2 + 0 + 2 . Then Var(X) = E(X 2 ) 2 = 2 .
We could also have calculated the mean and variance here from the mgf of the
normal distribution, which is given in the next proposition.
Proposition 3.3.8. If X N(, 2 ), then
MX (t) = et+
2 t2
2
t R.
Proof. We start with Z N(0, 1). We have

Z +
Z +
Z
2
1
et /2 + (t 2 2tz+z2 )/2
1
2 /2
2 )/2
tz
z
(2tzz
e e
dz =
e
dz =
e
dz
MZ (t) =
2
2
2
Z
Z
2
2
et /2 + (zt)2 /2
et /2 + w 2 /2
=
e
dz =
e
dw
2
2
= et
2 /2
In the general case, we can write X = Z + , where Z N(0, 1). Then

MX (t) = E[etX ] = E[et( Z+) ] = E[et Z et ] = et E[et Z ] = et MZ ( t) = et e
2 t2
2
= et+
2 t2
2
To calculate probabilities concerning normal random variables, we use tables of

the standard normal distribution, found at the end of these notes or the back of the
textbook.
48
Example. Suppose Z N(0, 1). Find

(1) P {0 Z 1.4}, (Answer=.4192)
(2) P {0 Z 1.42}, (Answer=.4222)
(3) P {Z > 1.42}, (Answer=.5 .4222 = .0778)
(4) P {Z < 1.42}, (Answer=.0778)
(5) P {1.5 < Z < 1.42} = P {0 < Z < 1.42} + P {0 < Z < 1.5} = .4222 + .4332 = .8554,
(6) P {1.25 < Z < 1.42} = P {0 < Z < 1.42}P {0 < Z < 1.25} = .4222.3944 = .0278.
Example. Suppose Z N(0, 1). Find z so that

(1) P {Z > z} = .05, (Answer=1.645)
(2) P {Z > z} = .025. (Answer=1.96)
Notation. Suppose Z N(0, 1). Let be a number with 0 < < 1. Then z is the
number along the horizontal axis such that P {Z > z } = .
.1
1.282
.05
1.645
z!
.025
1.96
.01
2.326
.005
2.576
Example. Suppose X N(2, 9). Find P [6.5 < X < 2.26].

Solution.
P [6.5 < X < 2.26] = P [4.5 < X < 4.26] = P [
= P [1.5 < Z < 1.42] = .8554.
4.5
X
4.26
<
<
]
3
3.3.5
49
The Beta Distribution.
Definition.
(+) x 1 (1 x)1 , if 0 < x < 1,

f (x) = () ()
0
otherwise,
where > 0 and > 0, is said to have a beta distribution with parameters and . We
write X Beta(, ). Notice that Unif(0,1) = Beta(1, 1).
Check.
It can be shown that

Z1
() ()
x 1 (1 x)1 dx =
,
( + )
0
and so f is a density function.

Proposition 3.3.9. Suppose X Beta(, ). Then
E(X) =
,
+
Var(X) =
( +
)2 (
+ + 1)
The mgf cannot be obtained in closed form.
3.3.6
The Cauchy Distribution.
Definition.

f (x) =
1
1
,
1 + x2
< x < +,
is said to have the Cauchy distribution on (, +).

Check.
We have
Z +
Z
+
1
1
2 + dx
2
2
= 1.
dx =
= tan1 (x)0 =
2
2
0 1+x
2
1 + x
Now note that (where y = x 2 )

Z +
Z +
+
1
dy

2
x
dx
=
=
log(1
+
y)
= +.
0
1 + x2
1+y
0
0
If we try to calculate E(X), we get
Z +
Z0
Z +
1
1
1
1
1
1
x
x
x
E(X) =
dx =
dx +
dx
2
2
1+x
1+x
1 + x2
0
Z +
Z +
1
1
1
1
=
y
dy
+
x
dx = + ,
1 + y2
1 + x2
0
0
where y = x. So E(X) does not exist.
50
Problem. A r.v. X having density function
2 1 , if x 0,
2
f (x) = 1+x
0
if x < 0
is said to have the Cauchy distribution on [0 + ). Show that f is a density function
and that E(X) = + (but we do not have the problem).
3.4
Chebychevs Inequality.
Proposition 3.4.1 (Markovs Inequality). Let X be a non-negative r.v. (i.e. P [X 0] = 1).

Then
E(X)
,
P [X > ]

for any > 0.
Proof. Assume X is a continuous r.v. with density f (x). Since X is nonnegative, f (x) =
R0
0 for x < 0. (This is because f (x)dx = P [X 0] = 0.) Then
Z +
Z
Z +
Z +
Z +
E(X) =
xf (x)dx =
xf (x)dx+
xf (x)dx
xf (x)dx
f (x)dx = P [X > ],
which gives the required result. The case where X is discrete is identical.
Proposition 3.4.2 (Chebychevs Inequality). Let X be a r.v.with finite mean and variance 2 . Then
2
P [|X | > ] 2 ,

for any > 0.
Proof. Applying the previous proposition, we have
P [|X | > ] = P [|X |2 > 2 ]
E(|X |2 )
2
=
.
2
2
Remark. Chebychevs inequality illustrates that the smaller the variance of X is, the
closer the value of X is likely to be to its mean .
Remark. Recall that if X = c is a constant r.v., then E(X) = c and Var(X) = 0. The
following shows that the converse is also true.
Proposition 3.4.3. Let X be a random variable with mean for which Var(X) = 0. Then
P [X = ] = 1.
Proof. By Chebychevs inequality, P [|X | > ] = 0 for every > 0. This can only be
true if P [|X | > 0] = 0.
Chapter 4
Multivariate Distributions.
4.1
Definitions.
Definition.
Let Y1 and Y2 be discrete r.v.s on the same sample space S. The function
f (y1 , y2 ) = P [Y1 = y1 , Y2 = y2 ],
< y1 , y2 < +
is called the joint probability function (jpf) of Y1 and Y2 .

Proposition 4.1.1. The jpf f (y1 , y2 ) of Y1 and Y2 satisfies
(1) f (y1 , y2 ) 0 for all y1 , y2 R,
PP
(2)
(y1 ,y2 ) f (y1 , y2 ) = 1, where the sum is over all (y1 , y2 ) for which f (y1 , y2 ) > 0.
Proposition 4.1.2. Let Y1 and Y2 be discrete r.v.s with jpf f (y1 , y2 ). If A R2 , then
XX
P [(Y1 , Y2 ) A] =
f (y1 , y2 ).
(y1 ,y2 )A
Proof. This follows from {(Y1 , Y2 ) A} =
{Y1 = y1 , Y2 = y2 }.
(y1 ,y2 )A
Next, we want to define the continuous analog of a jpf. To do this, we need, just as
in the univariate situation, the idea of a joint distribution function.
Definition. Let Y1 and Y2 be (any kind of) r.v.s defined on the same sample space S.
The function
F (y1 , y2 ) = P [Y1 y1 , Y2 y2 ],
< y1 , y2 < +,
is called the joint distribution function (JDF) of Y1 and Y2 .

Proposition 4.1.3. Let F (y1 , y2 ) be the JDF of Y1 and Y2 . If y1 y1 and y2 y2 , then
P [y1 < Y1 y1 , y2 < Y2 y2 ] = F (y1 , y2 ) F (y1 , y2 ) F (y1 , y2 ) + F (y1 , y2 ). (4.1)
51
52
CHAPTER 4. MULTIVARIATE DISTRIBUTIONS.
Proof. We have
{y1 < Y1 y1 , y2 < Y2 y2 } = {y1 < Y1 y1 , Y2 y2 } \ {y1 < Y1 y1 , Y2 y2 },
so
P [y1 < Y1 y1 , y2 < Y2 y2 ] = P [y1 < Y1 y1 , Y2 y2 ]P [y1 < Y1 y1 , Y2 y2 ].
Also, {y1 < Y1 y1 , Y2 y2 } = {Y1 y1 , Y2 y2 } \ {Y1 y1 , Y2 y2 }, so
P [y1 < Y1 y1 , Y2 y2 ] = P [Y1 y1 , Y2 y2 ] P [Y1 y1 , Y2 y2 ] = F (y1 , y2 )
F (y1 , y2 ). Similarly, P [y1 < Y1 y1 , Y2 y2 ] = F (y1 , y2 ) F (y1 , y2 ).
Proposition 4.1.4 (Properties of a JDF). Let F (y1 , y2 ) be the JDF of Y1 and Y2 . Then
(1) F (, ) = F (, y2 ) = F (y1 , ) = 0, F (+, +) = 1,
(2) If y1 y1 and y2 y2 , then
F (y1 , y2 ) F (y1 , y2 ) F (y1 , y2 ) + F (y1 , y2 ) 0.
Remark.
F (y1 , ) stands for limy2 F (y1 , y2 ) and so on.
Remark. Conversely, given a function F (y1 , y2 ) satisfying conditions (1) and (2), we
can find a probability space (S, P ) and on it r.v.s Y1 and Y2 having F (y1 , y2 ) as their
JDF.
Definition. Let Y1 and Y2 be r.v.s defined on the sample space S, with JDF F (y1 , y2 ).
If there exists a function f (y1 , y2 ) such that
Z y1 Z y2
F (y1 , y2 ) =
f (t1 , t2 )dt2 dt1 , < y1 , y2 < +,
(4.2)
we say that Y1 and Y2 are jointly continuous with joint density function (jdf) f (y1 , y2 ).
Proposition 4.1.5. Let Y1 and Y2 be jointly continuous r.v.s with jdf f (y1 , y2 ). Then
(1) f (y1 , y2 ) 0 for all y1 , y2 R,
R + R +
(2) f (y1 , y2 )dy1 dy2 = 1.
Proposition 4.1.6. Let Y1 and Y2 be jointly continuous r.v.s with jdf f (y1 , y2 ). Then
Z b2 Z b1
P [a1 < Y1 b1 , a2 < Y2 b2 ] =
a2
a1
f (y1 , y2 )dy1 dy2 .
(4.3)
More generally, if A R2 , then

ZZ
P [(Y1 , Y2 ) A] =
f (y1 , y2 )dy1 dy2 .
(4.4)
4.2. MARGINAL DISTRIBUTIONS AND THE EXPECTED VALUE OF FUNCTIONS OF RANDOM VARIABLES
Proof. Using (4.2), we see that

Z b2 Z b1
F (b1 , b2 ) F (b1 , a2 ) F (a1 , b2 ) + F (a1 , a2 ) =
a2
a1
f (y1 , y2 )dy1 dy2 .
Bu from (4.1), the LHS here is P [a1 < Y1 b1 , a2 < Y2 b2 ]. Thus, (4.3) is verified.
Note that if A in (4.4) is taken to be the rectangle [a1 , b1 ] [a2 , b2 ], then (4.3) results.
So (4.4) is true when A is a rectangle.
4.2
Marginal Distributions and the Expected Value of Functions of Random Variables.
Proposition 4.2.1. Let Y1 and Y2 be r.v.s.

(1) If
P Y1 and Y2 are discrete with jpf f (y1 , y2 ), then Y1 hasPpf f1 (y1 ) = P [Y1 = y1 ] =
y2 f (y1 , y2 ) and Y2 has pf f2 (y2 ) = P [Y2 = y2 ] =
y1 f (y1 , y2 ). f1 (y1 ) and
f2 (y2 ) are called the marginal probability functions of Y1 and Y2 respectively.
(2) If Y1 and Y2 are jointly continuous with jdf f (y1 , y2 ), then
Z +
Y1 is continuous with df f1 (y1 ) =
f (y1 , y2 )dy2 , and
Z +
Y2 is continuous with df f2 (y2 ) =
f (y1 , y2 )dy1 .
f1 (y1 ) and f2 (y2 ) are called the marginal density functions of Y1 and Y2 respectively.
Proof.
P
(1) Since {Y1 = y1P
} = y2 RY2 {Y1 = y1 , Y2 = y2 }, then P [Y1 = y1 ] = y2 P [Y1 =
y1 , Y2 = y2 ] = y2 f (y1 , y2 ).
o
R y1 nR +
(2) P [Y1 y1 ] = P [Y1 y1 , < Y2 < +] =
f
(t
,
t
)dt
dt1 . By defi1
2
2
R +
nition, Y1 is continuous with df f (y1 , y2 )dy2 . Here, we applied (4.4) with A
taken to be the rectangle A = (, y1 ] (, +).
54
Definition.
g(Y1 , Y2 ).
Let g(y1 , y2 ) be a function of y1 and y2 and consider the random variable
(1) If Y1 and Y2 are discrete with joint probability function f (y1 , y2 ), then the expected value of g(Y1 , Y2 ) is
E[g(Y1 , Y2 )] =
X X
g(y1 , y2 )f (y1 , y2 ).
all y2 all y1
(2) If Y1 and Y2 are jointly continuous with joint density function f (y1 , y2 ), then the
expected value of g(Y1 , Y2 ) is
Z + Z +
E[g(Y1 , Y2 )] =
4.2.1
g(y1 , y2 )f (y1 , y2 )dy1 dy2 .
Special Theorems.
Proposition 4.2.2. Let Y1 and Y2 be r.v.s.

(1) If c is a constant, then E(c) = c.
(2) E[cg(Y1 , Y2 )] = cE[g(Y1 , Y2 )].
(3) E[g1 (Y1 , Y2 ) + + gk (Y1 , Y2 )] = E[g1 (Y1 , Y2 )] + + E[gk (Y1 , Y2 )].
Remarks. In particular, for any random variables Y1 , . . . , Yk and constants c1 , . . . , ck ,

we always have
E[c1 Y1 + + ck Yk ] = c1 E(Y1 ) + + ck E(Yk ).
4.2.2
Covariance.
Definition.
Let Y1 and Y2 be r.v.s with means 1 and 2 respectively. Then

Cov(Y1 , Y2 ) = E[(Y1 1 )(Y2 2 )]
is called the covariance between Y1 and Y2 . If 12 and 22 denote the variances of Y1 and
Y2 , then
Cov(Y1 , Y2 )
=
1 2
is called the correlation coefficient between Y1 and Y2 . Y1 and Y2 are called uncorrelated
if Cov(Y1 , Y2 ) = 0 (or = 0).
Remarks. Cov(Y1 , Y2 ) = Cov(Y2 , Y1 ). If Y1 = Y2 , then Cov(Y1 , Y2 ) = Var(Y1 ). If Y2 is
constant, then Cov(Y1 , Y2 ) = 0.
Proposition 4.2.3. Cov(Y1 , Y2 ) = E(Y1 Y2 ) 1 2 .
Proof. We have
E[(Y1 1 )(Y2 2 )] = E[Y1 Y2 2 Y1 1 Y2 + 1 2 ] = E(Y1 Y2 ) + E(2 Y1 ) + E(1 Y2 ) + E(1 2 )
= E(Y1 Y2 ) 2 E(Y1 ) 1 E(Y2 ) + 1 2 = E(Y1 Y2 ) 1 2
Example. Random variables X and Y have joint probability function given by the following table:
-1
1
2
-2
.10
.05
.20
x
0
.25
.15
.10
3
.10
.05
Find:
(1) the marginals,
(2) P {X Y },
(3) E(X 2 Y ),
(4) Cov(X, Y ),
(5) the correlation coefficient XY .
(6) the moment generating function of Y .
Solution.
(1)
x
-2
-1
g(x)
.35
.5
.15
h(y)
.45
.25
.30
P
(2) Let A = {(x, y) : x y}. Then P [X Y ] = P [(X, Y ) A] = (x,y)A f (x, y) =
f (2, 1) + f (2, 1) + f (2, 2) + f (0, 1) + f (0, 2) = .6.
P P
(3) E(X 2 Y ) = x y x 2 yf (x, y) = (4.1)+(9.1)+(4.05)+(9.05)+(8.2) =
0.95.
56
P P
(4) E(XY ) = x y xyf (x, y) = (2.1)+(3.1)+(2.05)+(3.05)+(4.2) =
.85. Also, E(X) = .25 and E(Y ) = .4, so Cov(X, Y ) = E(XY ) E(X)E(Y ) = .95.
(5) E(X 2 ) = 2.75, so Var(X) = 2.75 (.25)2 = 2.68754. Also, E(Y 2 ) = 1.9, so
Var(Y ) = 1.74. Hence
XY =
(6) MY (t) = E[etY ] =
.95
= .4393.
2.68754 1.74
ety P [Y = y] = .45et + .25et + .3e2t .
Remark. Let Y1 and Y2 be jointly continuous with jdf f (y1 , y2 ), and let R = {(y1 , y2 )
R2 : f (y1 , y2 ) > 0}. Then for any A R2 , we have
ZZ
A
ZZ
f (y1 , y2 )dy1 dy2 =
AR
f (y1 , y2 )dy1 dy2 .
RR
RR
2
Reason.
If
A,
B
R
and
AB
=
,
then
f
(y
,
y
)dy
dy
=
1
2
1
2
AB
A f (y1 , y2 )dy1 dy2 +
RR
f
(y
,
y
)dy
dy
.
If
R
=
{(y
,
y
)
:
f
(y
,
y
)
>
0},
then
for
any A R2 , we
1
2
1
2
1 RR2
1
2
B
RR
c
have
A = (A R) (A
RR R ), so A f (y1 , y2 )dy1 dyRR2 = AR f (y1 , y2 )dy1 dy2 +
RR
AR c f (y1 , y2 )dy1 dy2 = AR f (y1 , y2 )dy1 dy2 + 0 = AR f (y1 , y2 )dy1 dy2 .
Example. Random variables X and Y are jointly continuous with joint density function
x + y,
f (x, y) =
0,
if 0 < x < 1, 0 < y < 1;

otherwise.
Find:
(1) the marginals,
(2) P {X 2 Y },
(3) E(X 2 Y ),
(4) Cov(X, Y ),
(5) the correlation coefficient XY .
(6) the moment generating function of Y .
Solution.
(1) The marginal density function of X is
Z +
0
fX (x) =
f (x, y)dy = R 1
(x + y)dy = [xy +
if x (0, 1),
y2
1 ]
2 0
=x+
1
2
if x (0, 1).
By symmetry, we have
0
fY (y) =
y +
if y (0, 1),
1
2
if y (0, 1).
(2) With A = {(x, y) R2 : x 2 > y}, we have

ZZ
ZZ
2
P [X > Y ] = P [(X, Y ) A] =
f (x, y)dydx =
(x + y)dydx
A
AR
#
#
Z1"
Z1
Z 1 "Z x 2
y2
x4
7
x2
3

(x + y)dy dx =
xy +
dx
=
[x
+
]dx =
.
=
0
2
2
20
0
0
0
0
Note: B in figure 4.3 is A R.
1.4
1.2
1.0
0.8
fig. 4.3
0.6
0.4
0.2
- 0.2
0.2
0.4
0.6
0.8
1.0
1.2
(3) We have
2
Z + Z +
Z1Z1
x yf (x, y)dxdy =
x 2 y(x + y)dxdy
0 0
#
1 #
Z1" 4
Z1
3

x
x
y
y2
3
2 2
2
=
x y + x y dx dy =
y+
y dy =
+
dy
4
3
3
0
0
0
0 4
0
1
y2 y3
= 17 .
=
+
8
9 0
72
E[X Y ] =

Z 1 "Z 1
58
(4) We have
Z1Z1
Z + Z +
xy(x + y)dxdy
xyf (x, y)dxdy =
0 0
#
"
1 #
Z1
Z1
3
2

x
x
y2
y
2
2
2
x y + xy dx dy =
=
y+
y dy =
+
dy
3
2
2
0
0
0
0 3
0
1
y2 y3
= 1.
=
+
6
6 0
3
E[XY ] =

Z 1 "Z 1
We also have
Z1
E(X) =
0
1
7
x(x + )dx =
,
2
12
Z1
E(X ) =
x 2 (x +
5
7
and E(Y 2 ) = 12
. Hence X2 = Y2 =
and so also E(Y ) = 12
1
Cov(X, Y ) = E(XY ) (EX)(EY ) = 144
, so
5
12
1
5
)dx =
,
2
12
7 2
( 12
) =
XY
Cov(X, Y )
1
q
= q 144
=
= .
11
11
X Y
11
144
(5) MY (t) =
R1
0
ety (y + 12 )dy =
tet et +1
t2
144
after an integration by parts.
Example. (no. 5.9, p. 233) Let Y1 and Y2 have jdf
k(1 y ) if 0 y y 1 (i.e. if (y , y ) R),

2
1
2
1
2
f (y1 , y2 ) =
0
elsewhere.
Find
(1) k,
(2) the marginals,
(3) P [Y1 34 , Y2 12 ].
Solution.
11
.
144
Finally,
4.3. CONDITIONAL PROBABILITY AND DENSITY FUNCTIONS.
59
Note that B (the shaded portion of the triangle) is A R.

(1) We have
ZZ
Z 1 Z y2
Z1
Z y2

1=
k(1 y2 )dy1 dy2 = k
1 y2 dy1 dy2 = k (1 y2 )
1dy1 dy2
R
0 0
0
0
Z1
Z1
Z1
k
= k (1 y2 )y2 dy2 = k[ y2 dy2
y22 dy2 ] = ,
6
0
0
0
so k = 6.
R1
6(1 y2 )dy2 = 6(1 y1 )2 . Hence
6(1 y )2 if y [0, 1],

1
1
f1 (y1 ) =
0
otherwise.
R y2
If y2 [0, 1], then f2 (y2 ) = 0 6(1 y2 )dy1 = 6y2 (1 y2 ). Hence
6y (1 y ) if y [0, 1],
2
2
2
f2 (y2 ) =
0
otherwise.
(2) If y1 [0, 1], then f1 (y1 ) =
y1
(3) Let A = {(y1 , y2 ) : y1 43 , y2 12 }.

ZZ
ZZ
3
1
P [Y1 , Y2 ] = P [(Y1 , Y2 ) A] =
f (y1 , y2 )dy2 dy1 = 6
(1 y2 )dy2 dy1
4
2
A
AR
Z 1/2 Z 1
Z 3/4 Z 1
=
6(1 y2 )dy2 dy1 +
6(1 y2 )dy2 dy1
0
1/2
1/2
y1
31
=
.
64
4.3
Conditional Probability and Density Functions.
Definition.
Let Y1 and Y2 be r.v.s.
(1) If Y1 and Y2 are discrete with jpf f (y1 , y2 ), if f2 (y2 ) is the marginal pf of Y2 , and
if y2 is such that f2 (y2 ) > 0, then
f12 (y1 |y2 ) = P [Y1 = y1 |Y2 = y2 ] =
P [Y1 = y1 , Y2 = y2 ]
f (y1 , y2 )
=
P [Y2 = y2 ]
f2 (y2 )
(as a function of y1 ) is called the conditional probability function of Y1 given that

Y2 = y2 .
(2) If Y1 and Y2 are jointly continuous with jdf f (y1 , y2 ), if f2 (y2 ) is the marginal df
of Y2 , and if y2 is such that f2 (y2 ) > 0, then
f12 (y1 |y2 ) =
f (y1 , y2 )
f2 (y2 )
(as a function of y1 ) is called the conditional density function of Y1 given that

Y2 = y2 .
60
Remark. In either case, f12 (y1 |y2 ) is not defined for a y2 with f2 (y2 ) = 0. For a fixed
y2 for which f12 (y1 |y2 ) is defined, it is a probability function (density function) as a
function of y1 . This is because
X
f12 (y1 |y2 ) =
y1
X f (y1 , y2 )
f2 (y2 )
y1
X
f2 (y2 )
1
f (y1 , y2 ) =
= 1,
f2 (y2 ) y1
f2 (y2 )
and
Z +
Z +
f12 (y1 |y2 )dy1 =
f (y1 , y2 )
1
dy1 =
f2 (y2 )
f2 (y2 )
Z +
f (y1 , y2 )dy1 =
f2 (y2 )
= 1.
f2 (y2 )
Definition. Let Y1 and Y2 be two r.v.s on the same sample space. Let y2 be such that
f2 (y2 ) > 0 so that the conditional probability (density) function f12 (y1 |y2 ) is defined.
Then
P
if Y1 is discrete,
y g(y1 )f12 (y1 |y2 )
E[g(Y1 )|Y2 = y2 ] = R +1
if Y1 is continuous,
g(y1 )f12 (y1 |y2 )dy1
is called the conditional expectation of g(Y1 ) given that Y2 = y2 .
Example. Random variables X and Y are jointly continuous with joint density function
x + y, if 0 < x < 1, 0 < y < 1;

f (x, y) =
0,
otherwise.
Find E(X|Y = .5).
Solution. The marginal density function of Y was previously found to be
0
if y (0, 1),
fY (y) =
y + 1 if y (0, 1).
2
Hence the conditional density function f (x|y) is undefined if y (0, 1). If y (0, 1),
then
x+y
f (x, y) 1 +y if x (0, 1),
f (x|y) =
= 2
0
fY (y)
otherwise.
In particular,
x + .5 if 0 < x < 1,
f (x|.5) =
0
otherwise,
so E(X|Y = .5) =
R1
0
x(x + 2 )dx =
7
.
12
4.4. INDEPENDENT RANDOM VARIABLES.
61
Remark. To say that a point W is chosen at random from the interval [a, b] means
that the r.v. W is uniformly distributed on [a, b].
Example. A point X is chosen at random from the interval (0, 1). Given that X = x, a
second point Y is chosen at random from the interval (0, x). Find E[X|Y = 32 ].
Solution. We have
1
g(x) =
0
if 0 < x < 1,
otherwise,
f (y|x) =
Hence
f (x, y) = f (y|x)g(x) =
so
Z +
h(y) =
if 0 < y < x,
otherwise.
if 0 < y < x < 1,
otherwise,
R
1 1 dx = log y
y x
f (x, y)dx =
0
if 0 < y < 1,
otherwise,
and then
undefined if y (0, 1),

f (x, y) 1
f (x|y) =
= x log y if 0 < y < x < 1
h(y)
0 otherwise.
Finally,
Z1
E[X|Y = y] =
Z1
xf (x|y)dx =
x
y
and thus
E[X|Y =
4.4
1
y 1
dx =
x log y
log y
for 0 < y < 1,
2
1
]=
.
3
3 log 32
Independent Random Variables.
Definition. Two random variables Y1 and Y2 are independent if for any A1 , A2 R,

the events {Y1 A1 } and {Y2 A2 } are independent (as defined in chapter 1); i.e.
P [{Y1 A1 } {Y2 A2 }] = P [Y1 A1 ]P [Y2 A2 ].
Proposition 4.4.1. If Y1 and Y2 are independent, then so are any functions g(Y1 ) and
h(Y2 ) of Y1 and Y2 .
Proof. For we can write {g(Y1 ) A1 } = {Y1 g 1 (A1 )} and {h(Y2 ) A2 } = {Y2
h1 (A2 )} and these two are independent.
62
Proposition 4.4.2. Two random variables Y1 and Y2 are independent iff F (y1 , y2 ) =
F1 (y1 )F2 (y2 ) for all y1 , y2 R. Here, F1 (y1 ) = F (y1 , +) = limy2 F (y1 , y2 ) is the
marginal distribution function of Y1 and F2 (y2 ) is the marginal distribution function of
Y2 .
Proposition 4.4.3. Two discrete random variables Y1 and Y2 with jpf f (y1 , y2 ) and
marginals f1 (y1 ) and f2 (y2 ) are independent iff f (y1 , y2 ) = f1 (y1 )f2 (y2 ) for all y1 , y2
R.
Two jointly continuous random variables Y1 and Y2 with jdf f (y1 , y2 ) and marginal
density funcions f1 (y1 ) and f2 (y2 ) are independent iff f (y1 , y2 ) = f1 (y1 )f2 (y2 ) for all
y1 , y2 R.
Problem. Show that if Y1 and Y2 are independent, then f12 (y1 |y2 ) = f1 (y1 ) and
E[g(Y1 )|Y2 = y2 ] = E[g(Y1 )].
Proposition 4.4.4. Let Y1 and Y2 be independent r.v.s and let g(Y1 ) and h(Y2 ) be functions only of Y1 and only of Y2 , respectively. Then
E[g(Y1 )h(Y2 )] = E[g(Y1 )]E[h(Y2 )],
provided these expectations exist. (Recall that E(X) is said to exist if E(|X|) < .)
Proof. Consider the case where Y1 and Y2 are discrete. We have
X X
X X
E[g(Y1 )h(Y2 )] =
g(y1 )h(y2 )f (y1 , y2 ) =
g(y1 )h(y2 )f1 (y1 )f2 (y2 )
all y2 all y1
all y2 all y1
h(y2 )f2 (y2 )
all y2
= E[g(Y1 )]
g(y1 )f1 (y1 ) =
all y1
h(y2 )f2 (y2 )E[g(Y1 )]
all y2
h(y2 )f2 (y2 ) = E[g(Y1 )]E[h(Y2 )].
all y2
Proposition 4.4.5. If Y1 and Y2 are independent, then Y1 and Y2 are uncorrelated.

Proof. Cov(Y1 , Y2 ) = E(Y1 Y2 ) 1 2 = E(Y1 )E(Y2 ) 1 2 = 0.
Example: random variables which are uncorrelated but not independent. X and Y
have jpf given by
x
-1
0
1
-1
1
8
1
8
1
8
1
8
1
8
1
8
1
8
0
1
8
4.5. THE EXPECTED VALUE AND VARIANCE OF LINEAR FUNCTIONS OF RANDOM VARIABLES.63
We calculate E(XY ) = 0. The
3
8
2
g(x) = P [X = x] = 8
3
8
marginals are
if x = 1,
if x = 0,
h(y) = P [Y = y] =
if x = 1,
8
2
8
3
8
if y = 1,
if y = 0,
if y = 1,
Hence E(X) = E(Y ) = 0, so Cov(X, Y ) = 0. But f (0, 0) = 0 g(0)h(0), so X and Y are

not independent.
Proposition 4.4.6. If X and Y are independent, then MX+Y (t) = MX (t)MY (t).
Proof. MX+Y (t) = E[et(X+Y ) ] = E[etX etY ] = E[etX ]E[etY ] = MX (t)MY (t).
4.5
The Expected Value and Variance of Linear Functions

of Random Variables.
Proposition 4.5.1. Let X1 , X2 , . . . , Xm and Y1 , Y2 , . . . , Yn be random variables on the same

sample space S, and let a1 , a2 , . . . , am and b1 , b2 , . . . , bn be numbers. Define
m
X
U=
a i Xi
and
V =
i=1
n
X
bj Yj .
j=1
Then
(1) E(U) =
Pm
i=1
ai E(Xi ).
Pm Pn
(2) Cov(U, V ) = i=1 j=1 ai bj Cov(Xi , Yj ).

Pm
PP
(3) Var(U) = i=1 a2i Var(Xi ) + 2
ai aj Cov(Xi , Xj ).
i<j
Proof.
(1) We have already seen this.
y
(2) Let ix = E(Xi ) and j = E(Yj ). Since U E(U ) =

Pn
y
j=1 bj (Yj j ), then
[U E(U)][V E(V )] =
m X
n
X
Pm
i=1
ai (Xi ix ) and V E(V ) =
ai bj (Xi ix )(Yj j ),
i=1 j=1
and so
Cov(U, V ) = E
m X
n
X
ai bj (Xi
ix )(Yj
i=1 j=1
m X
n
X
i=1 j=1
ai bj Cov(Xi , Yj ).
y
j )
m X
n
X
i=1 j=1
ai bj E[(Xi ix )(Yj j )]
64
(3) From (2), we have

Var(U) = Cov(U, U) =
m X
m
X
ai aj Cov(Xi , Xj ) =
XX
XX
ai aj Cov(Xi , Xj ) +
ai aj Cov(Xi , Xj )
i=1 j=1
m
X
a2i Var(Xi ) + 2
i=1
i,j:i=j
i,j:ij
XX
ai aj Cov(Xi , Xj )
i<j
Examples.
(1) Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z).
(2) Cov(aX, bY ) = abCov(X, Y ).
(3) Cov(3X 2Y , X + 5Y ) = Cov(3X 2Y , X) + Cov(3X 2Y , 5Y ) = Cov(3X, X) +
Cov(2Y , X)+Cov(3X, 5Y )+Cov(2Y , 5Y ) = 3Cov(X, X)+2Cov(Y , X)+15Cov(X, Y )
10Cov(Y , Y ) = 3Var(X) + 17Cov(X, Y ) 10Var(Y ).
Corollary 4.5.2. Suppose X1 , . . . , Xm are uncorrelated. Then
Var(U ) =
m
X
a2i Var(Xi ).
i=1
In particular, the variance of a sum of independent random variables is the sum of the
variances of the random variables.
Example. Let X and Y be independent random variables with distributions N(2, 3)
and N(3, 2) respectively. Let
Z = 2X + Y ,
W = X 3Y .
Find
(i) the distribution of Z
(ii) the correlation coefficient between Z and W .
Solution.
(1) We have MX (t) = e2t+
3t 2
2
and MY (t) = e3t+t , so

2
MZ (t) = M2X (t)MY (t) = MX (2t)MY (t) = e4t+6t e3t+t = e7t+7t ,

so Z N(7, 14).
(2)
Cov(Z, W ) = Cov(2X + Y , X 3Y ) = Cov(2X, X 3Y ) + Cov(Y , X 3Y )
= Cov(2X, X) + Cov(2X, 3Y ) + Cov(Y , X) + Cov(Y , 3Y )
= 2Cov(X, X) 6Cov(X, Y ) + Cov(Y , X) 3Cov(Y , Y )
= 2Var(X) 0 + 0 3Var(Y ) = 6 6 = 0.
Hence ZW =
Cov(Z,W )
Z W
= 0.
4.5. THE EXPECTED VALUE AND VARIANCE OF LINEAR FUNCTIONS OF RANDOM VARIABLES.65
Example. X and Y are jointly continuous with joint density
c(x + y)ex if 0 < x < +, 0 < y < 1,

f (x, y) =
0
otherwise.
Find (i) c (ii) the marginals g(x) and h(y) (iii) P [X Y ] (iv) Cov(X, Y ) (v) (vi) E(X|Y =
y).
Solution.
Let R = {(x, y) : 0 < x < +, 0 < y < 1}. Note that B (the triangle) is A R.
R + R +
(i) c must be chosen so that f (x, y)dxdy = 1. We have
Z + Z +
Z 1 Z +
1=
f (x, y)dxdy = c
(x + y)ex dxdy

0 0
"Z Z
#
Z Z
1
xex dxdy +
=c
0
yex dxdy
1
3
= c[1 + ] = c,
2
2
so c = 32 .
(ii) We have
0
if x < 0,
g(x) =
f (x, y)dy = R 1 2
2
1
x
dy = 3 ex (x + 2 ) if x 0,
0 3 (x + y)e
Z +
0
if y (0, 1),
h(y) =
f (x, y)dx = R + 2
2
x
(x + y)e dx = 3 (1 + y) if y (0, 1).
0
3
Z +
(iii) Let A = {(x, y) : x < y}. Then

ZZ
ZZ
P [X < Y ] = P [(X, Y ) A] =
f (x, y)dxdy =
A
Z1Zy
f (x, y)dxdy =
AR
2
3
= [5e1 ] = 0.226,
3
2
so
P [X Y ] = 1 P [X < Y ] = 0.774.
2
(x + y)ex dxd
3
66
(iv)
Z +
x =
Z +
2
1
2
x(x + )ex dx =
3
2
3
xg(x)dx =
0
Z +
2 x
x e
0
1
dx +
3
Z +
xex dx
1
5
2
= E(X 2 ) + E(X) = ,
3
3
3
where X Exp( = 1).
Z +
y =
2
yh(y)dy =
3
Z1
y(1 + y)dy =
0
5
.
9
#
"Z Z
Z 1 Z +
1 +
2
2
x 2 yex dxdy +
xy 2 ex dxdy
E(XY ) =
xy (x + y)ex dxdy =
3
3
0 0
0 0
0 0
" Z
#
Z
2 1 + 2 x
1 +
2
1
8
x
=
x e dx +
xe dx = [1 + ] = .
3 2 0
3 0
3
3
9
Z 1 Z +
Hence Cov(X, Y ) =
8
9
1
( 53 59 ) = 27
.
(v)
"Z
#
Z +
Z +
+
2
1
2
1
x 2 g(x)dx =
E(X 2 ) =
x 2 (x + )ex dx =
x 3 ex dx +
x 2 ex dx
3 0
2
3
2 0
0

1
14
2
(4) + (3) =
,
=
3
2
3
Z +
and
Z +
E(Y ) =
Hence x2 =
14
3
25
9
17
,
9
xy
2
y h(y)dy =
3
2
y2 =
7
18
25
81
Z1
y 2 (1 + y)dy =
13
,
162
7
.
18
so
1
27
Cov(X, Y )
=
=q q
= 0.095.
17
13
x y
9
162
(vi) We have
(x+y)ex
f (x, y) 1+y
(x|y) =
=
0
h(y)
Hence
0
E(X|Y = y) = R
if x 0, 0 < y < 1,
otherwise.
if y (0, 1),
x(x+y)ex
dx
1+y
2+y
1+y
if y (0, 1).
Proposition 4.5.3. 1 1 and || = 1 iff P [aX + bY = c] = 1.
4.6. THE MULTINOMIAL DISTRIBUTION.
67
Proof. Assume X > 0 and Y > 0 and define

W =
X
Y
.
X
Y
Then
X
0 Var(W ) = Var
X

X Y
2Cov
,
X Y

Y
+ Var
Y
2
= 1 2 2 + 2 = 1 2 ,
so 2 1. Next, note that

= 1 Var(W ) = 0 P

X
Y
= c = 1,
X
Y
and
= 1 P [aX + bY = c] = 1 where a and b have the same sign,
= 1 P [aX + bY = c] = 1 where a and b have opposite signs.
Remark.
4.6
If P [X = constant] = 1, then Cov(X, Y ) = 0.
The Multinomial Distribution.
We have an experiment E with k possible outcomes S1 , . . . , Sk (Sj is called a successof

type j) having probabilities p1 , . . . , pk of occurring , where p1 + + pk = 1. We
repeat this experiment n times, in such a way that whatever happens on any one trial
is independent of what happens on any other. That is, we have n independent trials of
E.
Let Xj =the number of Sj s observed over the n trials, j = 1, 2, . . . , k, so that X1 +
+ Xk = n.
What is the joint distribution of X1 , . . . , Xk ?
Answer:
P [X1 = x1 , . . . , Xk = xk ] =
x
n!
p 1
x1 !xk ! 1
pk k
if x1 , . . . , xk 0, x1 + + xk = n,
otherwise
X1 , . . . , Xk are said to have the multinomial distribution with parameters n, p1 , . . . , pk .

Notice that if k = 2, this is just the binomial distribution.
Facts:
(1) Each Xj Binom(n, pj ) i.e. has the binomial distribution with parameters n and
p = pj . The outcomes would be S = Sj and F =anything else. Similarly, any
subset of X1 , . . . , Xk will have a multinomial distribution.
68
Example. A large box of washers consists of 50% 41 " washers, 30% 18 " washers, and
20% 38 " washers. Ten washers are chosen at random, with replacement. What is the
probability of getting
(1) five 14 " washers, four 18 " washers, and one 38 " washer?
(2) exactly six 81 " washers?
(3) at most two kinds of washers among the chosen ones?
Solution. With obvious notation, we have
(1) P [X1 = 5, X2 = 4, X3 = 1] =
(2) P [X2 = 6] =
10!
.36 .74
6!4!
10!
.55 .34 .21
5!4!1!
= .064.
= .0368.
(3) Let A1 = {X1 = 0}, A2 = {X2 = 0}, A3 = {X3 = 0}, and A = {at most two kinds}.
Then A = A1 A2 A3 , so
P (A) = P (A1 ) + P (A2 ) + P (A3 ) P (A1 A2 ) P (A1 A3 ) P (A2 A3 ) + P (A1 A2 A3 )
= .510 + .710 + .810 .210 .310 .510 + 0 = .1356157.
4.7
4.7.1
More than Two Random Variables.

Definitions.
Definition.
function
Let Y1 , Y2 , . . . , Yn be n discrete r.v.s on the same sample space S. The
f (y1 , y2 , . . . , yn ) = P [Y1 = y1 , Y2 = y2 , . . . , Yn = yn ],
< y1 , y2 , . . . , yn < +
is called the joint probability function (jpf) of Y1 , Y2 , . . . , Yn .

Proposition 4.7.1. The jpf f (y1 , y2 , . . . , yn ) of Y1 , Y2 , . . . , Yn satisfies
(1) f (y1 , y2 , . . . , yn ) 0 for all y1 , y2 , . . . , yn R,
P
(2) y1 ,y2 ,...,yn f (y1 , y2 , . . . , yn ) = 1, where the sum is over all (y1 , y2 , . . . , yn ) for
which f (y1 , y2 , . . . , yn ) > 0.
Definition. Let Y1 , Y2 , . . . , Yn be (any kind of) r.v.s defined on the same sample space
S. The function
F (y1 , y2 , . . . , yn ) = P [Y1 y1 , Y2 y2 , . . . , Yn yn ],
< y1 , y2 , . . . , yn < +,
is called the joint distribution function (JDF) of Y1 , Y2 , . . . , Yn .
4.7. MORE THAN TWO RANDOM VARIABLES.
69
Definition. Let Y1 , Y2 , . . . , Yn be r.v.s defined on the sample space S. If there exists a

function f (y1 , y2 , . . . , yn ) such that
Z y1 Z y2
Z yn
F (y1 , y2 , . . . , yn ) =
f (t1 , t2 , . . . , tn )dtn dt2 dt1 , < y1 , y2 , . . . , yn < +,
we say that Y1 , Y2 , . . . , Yn are jointly continuous with joint density function (jdf) f (y1 , y2 , . . . , yn ).
Proposition 4.7.2. Let Y1 , . . . , Yn be jointly continuous r.v.s with jdf f (y1 , . . . , yn ). Then
(1) f (y1 , . . . , yn ) 0 for all y1 , . . . , yn R,
R +
R +
(2) f (y1 , . . . , yn )dy1 dyn = 1.
Proposition 4.7.3. Let Y1 , . . . , Yn be jointly continuous r.v.s with jdf f (y1 , . . . , yn ). If
A Rn , then
Z +
Z +
P [(Y1 , . . . , Yn ) A] =
f (y1 , . . . , yn )dy1 dyn .
4.7.2
Marginal and Conditional Distributions.
We will illustrate with a few examples using n = 4.

Examples (marginal).
(1) Let Y1 , Y2 , Y3 , Y4 be jointly discrete r.v.s. with jpf f (y1 , y2 , y3 , y4 ). Then the jpf of
Y1 and Y3 is
XX
f13 (y1 , y3 ) =
f (y1 , y2 , y3 , y4 ).
y2 y4
The jpf of Y2 , Y3 , and Y4 is

f234 (y2 , y3 , y4 ) =
f (y1 , y2 , y3 , y4 ).
y1
The pf of Y2 is
f2 (y2 ) =
XXX
f (y1 , y2 , y3 , y4 ).
y1 y3 y4
(2) Let Y1 , Y2 , Y3 , Y4 be jointly continuous r.v.s. with jdf f (y1 , y2 , y3 , y4 ). Then the
jdf of Y1 and Y3 is
Z + Z +
f13 (y1 , y3 ) =
f (y1 , y2 , y3 , y4 )dy2 dy4 .
The jdf of Y2 , Y3 , and Y4 is

Z +
f234 (y2 , y3 , y4 ) =
f (y1 , y2 , y3 , y4 )dy1 .
The df of Y2 is
Z + Z + Z +
f2 (y2 ) =
f (y1 , y2 , y3 , y4 )dy1 dy3 dy4 .
70
Examples (conditional).
(1) Let Y1 , Y2 , Y3 , Y4 be r.v.s which are either discrete or jointly continuous with f (y1 , y2 , y3 , y4 ).
If y2 , y4 are fixed and such that f24 (y2 , y4 ) > 0, then
def
f13|24 (y1 , y3 |y2 , y4 ) =
f (y1 , y2 , y3 , y4 )
f24 (y2 , y4 )
is the conditional pf (df) of Y1 , Y3 given that Y2 = y2 and Y4 = y4 . If f24 (y2 , y4 ) =

0, then f13|24 (y1 , y3 |y2 , y4 ) is not defined.
If Y1 , Y2 , Y3 , Y4 are discrete, then
f (y1 , y2 , y3 , y4 )
P [Y1 = y1 , Y2 = y2 , Y3 = y3 , Y4 = y4 ]
=
= P [Y1 = y1 , Y3 = y3 |Y2 = y2 , Y4 = y4
f24 (y2 , y4 )
P [Y2 = y2 , Y4 = y4 ]
Proposition 4.7.4. Let Y1 , Y2 , . . . , Yn be r.v.s with joint probability (or density) function
f (y1 , y2 , . . . , yn ). Let f1 (y1 ), f2 (y2 ), . . . , fn (yn ) be the marginal probability (density)
functions of Y1 , Y2 , . . . , Yn respectively. Then Y1 , Y2 , . . . , Yn are independent iff
f (y1 , y2 , . . . , yn ) = f1 (y1 )f2 (y2 ) fn (yn )
for all y1 , y2 , . . . , yn R.
4.7.3
Expectations and Conditional Expectations.
Definition.
Let g(Y1 , . . . , Yn ) be a function of the random variables Y1 , . . . , Yn .
(1) If Y1 , . . . , Yn are discrete with joint probability function f (y1 , . . . , yn ), then the
expected value of g(Y1 , . . . , Yn ) is
X
X X
E[g(Y1 , . . . , Yn )] =
g(y1 , y2 , . . . , yn )f (y1 , y2 , . . . , yn ).
all yn
all y2 all y1
(2) If Y1 , . . . , Yn are jointly continuous with joint density function f (y1 , . . . , yn ), then
the expected value of g(Y1 , . . . , Yn ) is
Z +
Z + Z +
E[g(Y1 , . . . , Yn )] =
g(y1 , y2 , . . . , yn )f (y1 , y2 , . . . , yn )dy1 dy2 dyn .
Example of Conditional Expectations. Let Y1 , Y2 , Y3 , Y4 be r.v.s which are either discrete or jointly continuous with f (y1 , y2 , y3 , y4 ). If y2 , y4 are fixed and such that
f24 (y2 , y4 ) > 0, then
P P
if Y1 , Y2 , Y3 , Y4 are d
y
y g(y1 , y3 )f13|24 (y1 , y3 |y2 , y4 )
E[g(Y1 , Y3 )|Y2 = y2 , Y4 = y4 ] = R +1 R +3
if Y1 , Y2 , Y3 , Y4 are j
g(y1 , y3 )f13|24 (y1 , y3 |y2 , y4 )dy1 dy3
Chapter 5
Functions of Random Variables.
5.1
5.1.1
Functions of Continuous Random Variables.

The Univariate Case.
Proposition 5.1.1. Let X be a continuous r.v. with density function fX (x), and let y =
(x) be a differentiable function which is either strictly increasing or strictly decreasing
on {x|fX (x) > 0}. Define Y = (X). Then Y has density function

dx

.
fY (y) = fX (x)
dy
(or equivalently

d1 (y)

.)
fY (y) = fX (1 (y))

dy
Proof.
P [X 1 (y)]
P [Y y] = P [(X) y] =
P [X 1 (y)]
F (1 (y))
X
=
if is decreasing
1 FX (1 (y))
if is increasing
Differentiating, we see that
F 0 (1 (y)) d1 (y)
if is inc.
d
X
dy
P [Y y] =
fY (y) =
d1 (y)
0
1
dy
FX ( (y)) dy
if is dec.

fX (x) dx
dx
if is inc.

dy

,
=
=
f
(x)
X
fX (x) dx if is dec.
dy
dy
where we used the fact that is increasing iff 1 is increasing.

Example. Suppose X Unif(0, 1). Find the distribution of Y = 2 log X.
71
if is inc.,
if is dec.
72
CHAPTER 5. FUNCTIONS OF RANDOM VARIABLES.

dx
dy
= x2 for x > 0. Also,
Solution. We have y = (x) = 2 log x, so dx = x2 , so dy
RY = (0, ). Hence for y (0, ),

dx
x
1

= 1 = ey/2 ,
fY (y) = fX (x)
dy
2
2
so Y Exp(mean=2).
Remark. If x1 , . . . , xn are values from independent r.v.s uniform on (0, 1) (easy to
generate on a computer), then y1 , . . . , yn (where yi = 2 log xi ) are values from independent r.v.s having exponential distributions with mean 2.
Examples.
1 ey/2
(1) Suppose Y has distribution function FY (y) =
0
if y 0,
if y < 0.
FY (Y ). Then X Unif(0, 1). This is true in general (see below).
. Define X =
(2) Suppose X N(0, 1). Find the distribution of Y = X 2 .
Solution. Notice that the monotone assumption of the above proposition is not
satisfied. We have
Z y
p
p
p
1
2
2
ex /2 dx
P [Y y] = P [X y] = P [ y X y] = 2P [0 X y] = 2
2
0
(putting w = x 2 )
Zy
=2
0
1
dw
ew/2 =
2 w
2
Zy
0
w 1/2 ew/2 dw,

2
so Y Gamma( = 12 , = 2) (i.e. chi-square with 1 df).

Proposition 5.1.2. Let F (x) be a distribution function, and define
F 1 (u) = inf{x|F (x) u},
u (0, 1).
(1) let U be uniformly distributed on (0, 1). Then the random variable X = F 1 (U) has
distribution function F .
(2) If F is continuous and X is a random variable with distribution function F , then
U = F (X) is uniform on (0, 1).
Proof. Not difficult.
5.1. FUNCTIONS OF CONTINUOUS RANDOM VARIABLES.
5.1.2
73
The Multivariate Case.
Transformations of Joint Distributions. Let X and Y be jointly continuous with joint

density function fX,Y (x, y). Suppose that u(x, y) and v(x, y) are functions defined
at least on R(X,Y ) , and let U = u(X, Y ), V = v(X, Y ). Then the question is: what is the
joint distribution of U and V ?
In general, each case must be considered on its own merits, but we do have one
useful result, which appears below. Let D R2 be such that Pr{(X, Y ) D} = 1. For
example, we could take D = {(x, y) R2 |fX,Y (x, y) > 0}. Suppose that u(x, y) and
v(x, y) are defined at least on D and are such that the transformation T : D S =
T (D) R2 defined by (u, v) = T (x, y) = (u(x, y), v(x, y)) is one-to-one, and the
inverse transformation (x, y) = T 1 (u, v) has Jacobian
x
u
y
u
(x, y)
= det
(u, v)
x
v
y
v
where the partial derivatives are continuous on S. Then U and V have joint density
function given by
(x, y)
fU,V (u, v) = fX,Y (x, y)|
|.
(u, v)
Proof. We have
ZZ
Pr{(U, V ) A} = Pr{(X, Y ) T (A)} =

fX,Y (x, y)dxdy
T 1 (A)
ZZ
(x, y)
=
fX,Y (T 1 (u, v))|
|dudv,
(u, v)
A
from which the required formula follows.
Example. Let X and Y be independent each with distribution N(0, 1). Find the distribution of U = X/Y .
Solution. Define V = Y . Then x = uv, y = v, so
(x, y)
= det
(u, v)
x
u
y
u
x
v
y
v
v u
= det
0 1
!
= v.
RU = RV = R. Then
2
fU,V (u, v) = fX,Y (x, y)|

=
ev
2 (u2 +1)/2
(x, y)
ex /2 ey /2
e(x +y
|v| =
|=
(u, v)
2
2
2
|v|, < u, v < .
2 )/2
|v| =
e(u
2 v 2 +v 2 )/2
|v|
74
We want the marginal density function of U . It is

Z +
fU (u) =
=
Z +
fU,V (u, v)dv =
1
(u2
+ 1)
ev
2 (u2 +1)/2
Z0
|v|dv =
ev
2 (u2 +1)/2
Z +
(v)dv +
ev
, < u < .
So U has the Cauchy distribution. E(U ) does not exist.
5.2
5.2.1
Sums of Independent Random Variables.

The Discrete Case.
Example. Suppose X and Y are independent Poisson random variables with parameters x and y . Find the distribution of Z = X + Y .
Solution. Note that RZ = {0, 1, 2, . . .}. For z RZ , we have {X + Y = z} = zx=0 {X =
x, Y = z x} pairwise disjoint, so
P [Z = z] =
z
X
P [X = x, Y = z x] =
x=0
z
X
z
X
P [X = x]P [Y = z x]
x=0
z
zx
zx
xx ex y e y
x
e(x +y ) X
y
=
=
z! x
x!
(z x)!
z!
x! (z x)!
x=0
x=0
!
z
e(x +y ) X z
e(x +y )
xx zx
=
=
(x + y )z .
y
x
z!
z!
x=0
Thus X + Y Poisson(x + y ).
5.2.2
The Jointly Continuous Case.
Let X and Y have joint density function f (x, y). What is the distribution of Z = X + Y ?
Solution. With R = {(x, y) : x + y z}, we have
2 (u2 +1)/2
vdv
5.3. THE MOMENT GENERATING FUNCTION METHOD.
ZZ
75
Z + Z zy
P [Z z] = P [X + Y z] = P [(X, Y ) R] =
f (x, y)dxdy
f (x, y)dxdy =
R

Z + Z z
Z z Z +
Zz
=
f (w y, y)dwdy =
f (w y, y)dydw =
g(w)dw,
from which it follows that Z is a continuous r.v. with density g. That is, the density
function of Z is
Z +
fZ (z) =
f (z y, y)dy.
Special cases are:

(1) X and Y are independent. Then
Z +
fZ (z) =
fX (z y)fY (y)dy.
(2) X and Y are independent and non-negative. Then fY (y) = 0 if y 0, and fX (z

y) = 0 if z y 0, leading to
R
z f (z y)f (y)dy, if z 0,
Y
0 X
fZ (z) =
0
if z < 0.
The integral here is called the convolution of fX and fY .
Example. Suppose X and Y are independent exponential with parameters x and y .
Find the distribution of Z = X + Y . (Note: E(X) = 1
x .)
Solution. Z has density function
Zz
Zz
fZ (z) =
fX (z y)fY (y)dy =
x ex (zy) y ey y dy
0
0
2 zez
if x = y = ,

= x y
e y e x
if x y ,
x y
for z 0, and fZ (z) = 0 for z < 0.
5.3
5.3.1
The Moment Generating Function Method.

A Summary of Moment Generating Functions.
So far, the theory of moment generating functions has been dispersed throughout these
notes. Here, we present a summary of this theory.
76
Definition.
, then

Let X be a r.v. If there exists a > 0 such that E(etX ) < for all < t <
def
MX (t) = E(etX ),
< t <
is called the moment generating function (mgf) of X. We have

P
tx
xRX e fX (x) if X is discrete,

MX (t) = R + tx
if X is continuous.
e f (x)dx
Proposition 5.3.1 (Properties of An Mgf).
(5.1)
(1) MX (0) = 1,
(2) If a and b are numbers, MaX+b (t) = etb MX (at).

(n)
(0)
(3) (Moment Generating Property) MX (0) = E(X n ), n = 0, 1, . . . (where MX (t) =

MX (t)).
(4) MX (t) determines the distribution of X uniquely; i.e. if X and Y are such that
MX (t) = MY (t) for all t with < t < + (where > 0), then FX FY (X and Y
have the same distribution functions).
(5) If X and Y are independent random variables having mgfs MX (t) and MY (t), then
X + Y has mgf MX (t)MY (t). More generally, if X1 , X2 , . . . , Xn are independent r.vs
and if X = X1 + X2 + + Xn , then
MX (t) = MX1 (t)MX2 (t) MXn (t).
d
(6) (Continuity Property) Let X1 , X2 , . . . and X be r.v.s. Then Xn X (i.e. P [Xn x]

P [X x] as n for all points x R (of continuity of F (x) = P [X x]) if and
only if MXn (t) MX (t) as n , for all t R.
Proof. For (2), we have
MaX+b (t) = E[et(aX+b) ] = E[etaX etb ] = etb E[etaX ] = etb MX (at).
For (5), we have
MZ (t) = E[et(X+Y ) ] = E[etX etY ] = E[etX ]E[etY ] = MX (t)MY (t).
Example. Suppose X and Y are independent Poisson random variables with parameters x and y . Find the distribution of Z = X + Y .
Solution. We have
MZ (t) = MX (t)MY (t) = ex (e
t 1)
ey (e
t 1)
= e(x +y )(e
t 1)
By the uniqueness property of the mgf, we have X + Y Poisson(x + y ).
5.3. THE MOMENT GENERATING FUNCTION METHOD.
77
Example. Suppose X and Y are independent with distributions Gamma(1 , ) and

Gamma(2 , ). Find the distribution of Z = X + Y .
Solution. We have
MZ (t) = MX (t)MY (t) =
1
1
1
1
2
(1 t)
(1 t)
(1 t)1 +2
By the uniqueness property of the mgf, we have X + Y Gamma(1 + 2 , ).

Example. Suppose X and Y are independent exponential with the same parameter .
Find the distribution of Z = X + Y .
Solution. X + Y Gamma(2, ).
Definition. Let n 1 be an integer. A random variable X with distribution Gamma( n2 , 2)
2
is said to have the chi-square distribution with n degrees of freedom. We write X n
.
2
2
Example. Suppose X and Y are independent with X m
and Y n
. Find the
distribution of Z = X + Y .
2
Solution. X + Y m+n
.
Example. Suppose X and Y are independent with X N(x , x2 ) and Y N(y , y2 ).

Find the distribution of Z = X + Y .
Solution. We have
MZ (t) = MX (t)MY (t) = e
x t+
2 t2
x
2
y t+
2 t2
y
2
=e
(x +y )t+
2 + 2 )t 2
(x
y
2
By the uniqueness property of the mgf, we have X + Y N(x + y , x2 + y2 ).

2
Example. Suppose X1 , X2 , . . . , Xm are independent with Xi n
. Find the distribution
i
of X = X1 + X2 + + Xm .
Solution. We have
MX (t) = MX1 (t) MXm (t) =
2
so X n
.
1 ++nm
1
(1 2t)
n1
2
1
(1 2t)
nm
2
1
(1 2t)
n1 ++nm
2
78
Chapter 6
Law of Large Numbers and the Central
Limit Theorem.
Definition. A sequence X1 , X2 , . . . of random variables, all defined on the sample space
S, is said to be i.i.d. (independent and identically distributed) if they all have the same
distribution function and every finite subset of them is independent.
6.1
Law of Large Numbers.
Definition. Let Y1 , Y2 , . . . be a sequence of random variables, and let Y be a random

variable, all defined on a sample space S. We say that Yn tends to Y in probability, and
P
write Yn Y , as n , if
P [|Yn Y | > ] 0 as n
for every > 0.
Proposition 6.1.1 (The (Weak) Law of Large Numbers). Let X1 , X2 , . . . be a sequence of
i.i.d. random variables with finite mean and finite variance 2 . Define Sn = X1 + X2 +
+ Xn for n 1. Then
Sn
n
as n . That is,
P [|
Sn
| > ] 0 as n
n
for every > 0.

Proof. Note that E( Snn ) = and Var( Snn ) =
P [|
2
.
n
By Chebychev, we have
Sn
Var(Sn /n)
2
| > ]
=
0
n
2
n2
as n .
79
80
CHAPTER 6. LAW OF LARGE NUMBERS AND THE CENTRAL LIMIT THEOREM.
Remark.
Let
Suppose a coin with probability p of getting a head is tossed over and over.
1
Xi =
0
if the ith toss results in a head,

if the ith toss results in a tail.
Then Sn is the number of heads in n tosses of the coin, and

of course, the way we would estimate p in practice.
Sn
n
p as n . This is,
Proposition 6.1.2 (The (Strong) Law of Large Numbers). Let X1 , X2 , . . . be a sequence of

i.i.d. random variables with finite mean . Define Sn = X1 + X2 + + Xn for n 1.
Then
Sn ()
6 }
N = { S :
n
has P (N) = 0.
The proof of the SLLN (strong law of large numbers) is much more difficult than that
of the WLLN. The SLLN is a much stronger statement than the WLLN.
6.2
The Central Limit Theorem.
Theorem 6.2.1 (CLT). Let X1 , X2 , X3 , . . . be a sequence of i.i.d. random variables, each

with finite mean and finite variance 2 . Let Sn = X1 + X2 + + Xn for n 1. Then
Sn n d
N(0, 1),
n
meaning
Z x z2 /2
e
Sn n
x
dz as n for all x R
P
n
2
Remark.
Equivalently to (1), we have

"
P
=
where X
(6.1)
X
x
/ n
Zx
ez /2
dz as n for all x R
2
X1 ++Xn
.
n
Problem. Forty-nine pieces of material are to be fitted together to form one large
section. If the error made in each piece is uniform on [ 81 , 18 ], and the errors are
independent, what is the approximate probability that the magnitude of the total error
1
exceeds 4 ?
6.2. THE CENTRAL LIMIT THEOREM.
81
Solution. Let Xi = the error in the ith piece, and S49 = X1 + + X49 be the total error.
Then
(2/8)2
1
E(Xi ) = 0, Var(Xi ) =
=
,
12
16 12
so = .072 and n = .072 7 = .505. Thus

1
1
1
1
S49 n
P [|S49 | > ] = 1 P [ < S49 < ] = 1 P [ 4 <

< 4 ]
4
4
4
.505
n
.505
= 1 P [.494 < Z < .494] = .6242
from the normal tables.
Examples.
(1) Suppose X Binom(n, p). Let
1
Xi =
0
if ith trial results in S,

if ith trial results in F.
Then X1 , X2 , . . . are i.i.d. and X = X1 + X2 + + Xn for every n 1. From the

CLT, we have
"
P
i.e.
Xnp
npq
X np
x
npq
Zx
ez /2
dz as n for all x R
2
has approximately the distribution N(0, 1) for large n. The usual cri-
terion is n 25. As a numerical example, suppose X Binom(100, 12 ). We will

find the approximate value of P (48 < X 51) (i.e. P (X {49, 50, 51}). We make
the "continuity correction" and actually approximate P (49 X 51) where X is
binomial by calculating P (48.5 X 51.5) where now X N(50, npq = 25). We
have

48.5 50
X n
51.5 50
P (48.5 X 51.5) = P
= P [.3 < Z < .3] = .2358

5
n
5
from the tables. The exact answer is P (X {49, 50, 51}) = .2356, as calculated by
Mathematica.
2
(2) Suppose X n
(i.e. Gamma( = n2 , = 2)). Then we can write X = X1 + + Xn ,
where X1 , X2 , . . . are i.i.d. 12 random variables. Note that = EXi = 1 and 2 =
Var(Xi ) = 2. By the CLT in (1), we have
X n d
N(0, 1) as n .
2n
82
Reason for Continuity Correction.

p(x)
3/8
1/8
0
x
1
The rectangular outline is the probability function p(x) of X Bin(3, 2 ). The smooth
curve is the density function of N(1.5, .75). Suppose we want P [1 X 2] (i.e. P [X =
1 or 2]. This is the shaded rectangular area. To approximate it with a normal area, we
should take the area under the normal density curve between 0.5 and 2.5 (rather than
between 1 and 2).
83
84
Upper tail probabilities of the standard Normal distribution

P(Z z), Z N(0, 1)
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
0.00
0.5000
0.4602
0.4207
0.3821
0.3446
0.3085
0.2743
0.2420
0.2119
0.1841
0.1587
0.1357
0.1151
0.0968
0.0808
0.0668
0.0548
0.0446
0.0359
0.0287
0.0228
0.0179
0.0139
0.0107
0.0082
0.0062
0.0047
0.0035
0.0026
0.0019
0.0013
0.0010
0.0007
0.0005
0.0003
0.0002
0.0002
0.0001
0.0001
0.0000
0.0000
0.01
0.4960
0.4562
0.4168
0.3783
0.3409
0.3050
0.2709
0.2389
0.2090
0.1814
0.1562
0.1335
0.1131
0.0951
0.0793
0.0655
0.0537
0.0436
0.0351
0.0281
0.0222
0.0174
0.0136
0.0104
0.0080
0.0060
0.0045
0.0034
0.0025
0.0018
0.0013
0.0009
0.0007
0.0005
0.0003
0.0002
0.0002
0.0001
0.0001
0.0000
0.0000
0.02
0.4920
0.4522
0.4129
0.3745
0.3372
0.3015
0.2676
0.2358
0.2061
0.1788
0.1539
0.1314
0.1112
0.0934
0.0778
0.0643
0.0526
0.0427
0.0344
0.0274
0.0217
0.0170
0.0132
0.0102
0.0078
0.0059
0.0044
0.0033
0.0024
0.0018
0.0013
0.0009
0.0006
0.0005
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.03
0.4880
0.4483
0.4090
0.3707
0.3336
0.2981
0.2643
0.2327
0.2033
0.1762
0.1515
0.1292
0.1093
0.0918
0.0764
0.0630
0.0516
0.0418
0.0336
0.0268
0.0212
0.0166
0.0129
0.0099
0.0075
0.0057
0.0043
0.0032
0.0023
0.0017
0.0012
0.0009
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.04
0.4840
0.4443
0.4052
0.3669
0.3300
0.2946
0.2611
0.2296
0.2005
0.1736
0.1492
0.1271
0.1075
0.0901
0.0749
0.0618
0.0505
0.0409
0.0329
0.0262
0.0207
0.0162
0.0125
0.0096
0.0073
0.0055
0.0041
0.0031
0.0023
0.0016
0.0012
0.0008
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.05
0.4801
0.4404
0.4013
0.3632
0.3264
0.2912
0.2578
0.2266
0.1977
0.1711
0.1469
0.1251
0.1056
0.0885
0.0735
0.0606
0.0495
0.0401
0.0322
0.0256
0.0202
0.0158
0.0122
0.0094
0.0071
0.0054
0.0040
0.0030
0.0022
0.0016
0.0011
0.0008
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.06
0.4761
0.4364
0.3974
0.3594
0.3228
0.2877
0.2546
0.2236
0.1949
0.1685
0.1446
0.1230
0.1038
0.0869
0.0721
0.0594
0.0485
0.0392
0.0314
0.0250
0.0197
0.0154
0.0119
0.0091
0.0069
0.0052
0.0039
0.0029
0.0021
0.0015
0.0011
0.0008
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.07
0.4721
0.4325
0.3936
0.3557
0.3192
0.2843
0.2514
0.2206
0.1922
0.1660
0.1423
0.1210
0.1020
0.0853
0.0708
0.0582
0.0475
0.0384
0.0307
0.0244
0.0192
0.0150
0.0116
0.0089
0.0068
0.0051
0.0038
0.0028
0.0021
0.0015
0.0011
0.0008
0.0005
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.08
0.4681
0.4286
0.3897
0.3520
0.3156
0.2810
0.2483
0.2177
0.1894
0.1635
0.1401
0.1190
0.1003
0.0838
0.0694
0.0571
0.0465
0.0375
0.0301
0.0239
0.0188
0.0146
0.0113
0.0087
0.0066
0.0049
0.0037
0.0027
0.0020
0.0014
0.0010
0.0007
0.0005
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.09
0.4641
0.4247
0.3859
0.3483
0.3121
0.2776
0.2451
0.2148
0.1867
0.1611
0.1379
0.1170
0.0985
0.0823
0.0681
0.0559
0.0455
0.0367
0.0294
0.0233
0.0183
0.0143
0.0110
0.0084
0.0064
0.0048
0.0036
0.0026
0.0019
0.0014
0.0010
0.0007
0.0005
0.0003
0.0002
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
85
Area
86
Probabilities of the standard Normal distribution

P(0 Z z), Z N(0, 1)
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
0.00
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.2257
0.2580
0.2881
0.3159
0.3413
0.3643
0.3849
0.4032
0.4192
0.4332
0.4452
0.4554
0.4641
0.4713
0.4772
0.4821
0.4861
0.4893
0.4918
0.4938
0.4953
0.4965
0.4974
0.4981
0.4987
0.4990
0.4993
0.4995
0.4997
0.4998
0.4998
0.4999
0.4999
0.5000
0.5000
0.01
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.2291
0.2611
0.2910
0.3186
0.3438
0.3665
0.3869
0.4049
0.4207
0.4345
0.4463
0.4564
0.4649
0.4719
0.4778
0.4826
0.4864
0.4896
0.4920
0.4940
0.4955
0.4966
0.4975
0.4982
0.4987
0.4991
0.4993
0.4995
0.4997
0.4998
0.4998
0.4999
0.4999
0.5000
0.5000
0.02
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.2324
0.2642
0.2939
0.3212
0.3461
0.3686
0.3888
0.4066
0.4222
0.4357
0.4474
0.4573
0.4656
0.4726
0.4783
0.4830
0.4868
0.4898
0.4922
0.4941
0.4956
0.4967
0.4976
0.4982
0.4987
0.4991
0.4994
0.4995
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.03
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.2357
0.2673
0.2967
0.3238
0.3485
0.3708
0.3907
0.4082
0.4236
0.4370
0.4484
0.4582
0.4664
0.4732
0.4788
0.4834
0.4871
0.4901
0.4925
0.4943
0.4957
0.4968
0.4977
0.4983
0.4988
0.4991
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.04
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.2389
0.2704
0.2995
0.3264
0.3508
0.3729
0.3925
0.4099
0.4251
0.4382
0.4495
0.4591
0.4671
0.4738
0.4793
0.4838
0.4875
0.4904
0.4927
0.4945
0.4959
0.4969
0.4977
0.4984
0.4988
0.4992
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.05
0.0199
0.0596
0.0987
0.1368
0.1736
0.2088
0.2422
0.2734
0.3023
0.3289
0.3531
0.3749
0.3944
0.4115
0.4265
0.4394
0.4505
0.4599
0.4678
0.4744
0.4798
0.4842
0.4878
0.4906
0.4929
0.4946
0.4960
0.4970
0.4978
0.4984
0.4989
0.4992
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.06
0.0239
0.0636
0.1026
0.1406
0.1772
0.2123
0.2454
0.2764
0.3051
0.3315
0.3554
0.3770
0.3962
0.4131
0.4279
0.4406
0.4515
0.4608
0.4686
0.4750
0.4803
0.4846
0.4881
0.4909
0.4931
0.4948
0.4961
0.4971
0.4979
0.4985
0.4989
0.4992
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.07
0.0279
0.0675
0.1064
0.1443
0.1808
0.2157
0.2486
0.2794
0.3078
0.3340
0.3577
0.3790
0.3980
0.4147
0.4292
0.4418
0.4525
0.4616
0.4693
0.4756
0.4808
0.4850
0.4884
0.4911
0.4932
0.4949
0.4962
0.4972
0.4979
0.4985
0.4989
0.4992
0.4995
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.08
0.0319
0.0714
0.1103
0.1480
0.1844
0.2190
0.2517
0.2823
0.3106
0.3365
0.3599
0.3810
0.3997
0.4162
0.4306
0.4429
0.4535
0.4625
0.4699
0.4761
0.4812
0.4854
0.4887
0.4913
0.4934
0.4951
0.4963
0.4973
0.4980
0.4986
0.4990
0.4993
0.4995
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.09
0.0359
0.0753
0.1141
0.1517
0.1879
0.2224
0.2549
0.2852
0.3133
0.3389
0.3621
0.3830
0.4015
0.4177
0.4319
0.4441
0.4545
0.4633
0.4706
0.4767
0.4817
0.4857
0.4890
0.4916
0.4936
0.4952
0.4964
0.4974
0.4981
0.4986
0.4990
0.4993
0.4995
0.4997
0.4998
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000

Introduction To Probability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Probability

Uploaded by

Copyright:

Available Formats

INTRODUCTION

2 Discrete Random Variables.

3 Continuous Random Variables.

Independent Random Variables. . . . . . . . . . . . . .

6 Law of Large Numbers and the Central Limit Theorem.

5 Functions of Random Variables.

Some examples of events are

CHAPTER 1. INTRODUCTION AND DEFINITIONS.

So far, these are all finite sample spaces.

A and B are compound and C is a simple event.

If A, B are events, then

(1) A B is the set of outcomes that belong to A or to B, or to both,

1.1. BASIC DEFINITIONS.

Terminology. If when the experiment is performed, the outcome occurs, and if A

That is, P (A1 A2 ) = P (A1 ) + P (A2 ) + .

If P is a probability on S, the pair (S, P ) is called a probability space.

Proposition 1.1.1. Let P be a probability on S.

That is, P (A1 A2 An ) = P (A1 ) + P (A2 ) + + P (An ).

CHAPTER 1. INTRODUCTION AND DEFINITIONS.

Proposition 1.1.2 (Rules for Computing Probabilities).

(1) P (Ac ) = 1 P (A).

1.1. BASIC DEFINITIONS.

Calculating Probabilities of Events in a Discrete Sample Space. In a discrete sample

Example. Suppose a die has probabilities

Solution. Let A = {2, 4, 6}. Then

where |A| is the number of outcomes in A.

CHAPTER 1. INTRODUCTION AND DEFINITIONS.

Since fish are chosen at random, then S is equiprobable.

Since fish are chosen at random, then S is equiprobable.

1.2. PERMUTATIONS AND COMBINATIONS.

Permutations and Combinations.

Example. A committee consisting of a president, a vice-president, and a treasurer is

CHAPTER 1. INTRODUCTION AND DEFINITIONS.

Solution. (1) 5 4 3 = 60 (2) 5 5 5 = 125.

If n is an integer like 1, 2, . . ., we define

The number of permutations (i.e. orderings) of n objects, taken r at a

Example. Suppose we have 4 objects A1 , A2 , A3 , A4 . The combinations taken two at a

1.2. PERMUTATIONS AND COMBINATIONS.

CHAPTER 1. INTRODUCTION AND DEFINITIONS.

Example. Suppose we have n symbols, of which x are Ss and n x are F s. How

ways. The last equality comes after a little arithmetic.

1.3. CONDITIONAL PROBABILITY AND INDEPENDENCE.

Conditional Probability and Independence.

This suggests the following definition.

CHAPTER 1. INTRODUCTION AND DEFINITIONS.

1.3. CONDITIONAL PROBABILITY AND INDEPENDENCE.

P [{both} {at least one}]

P [first an ace|second an ace] =

CHAPTER 1. INTRODUCTION AND DEFINITIONS.

= 1. If C and D are mutually exclusive events, then

1.3. CONDITIONAL PROBABILITY AND INDEPENDENCE.

Two events A and B are called independent if P (A B) = P (A)P (B).

Solution. For (i), we have

Solution. Let S = {Susan passes}, G = {Georges passes}. We assume S and G are

Solution. We are working in the sample space consisting of 36 outcomes.

CHAPTER 1. INTRODUCTION AND DEFINITIONS.

More on Independence. Three events A, B, C are independent if

Bayes Rule and the Law of Total Probability.

The events B1 , B2 , . . . Bn form a partition of S if

(1) they are pairwise mutually exclusive (i.e. Bi Bj = if i j),

1.4. BAYES RULE AND THE LAW OF TOTAL PROBABILITY.

and then we substitute for P (A) from part (1).

Remark. Everything remains valid if n is replaced by . That is if we have a partition