Professional Documents
Culture Documents
TO
PROBABILITY
William J. Anderson
McGill University
Contents
1 Introduction and Definitions.
1.1 Basic Definitions. . . . . . . . . . . . . . . . . . .
1.2 Permutations and Combinations. . . . . . . . .
1.3 Conditional Probability and Independence. .
1.4 Bayes Rule and the Law of Total Probability.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
11
15
20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
27
27
29
30
31
31
32
.
.
.
.
.
.
.
.
.
.
35
35
37
40
40
42
43
45
49
49
50
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Multivariate Distributions.
4.1 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Marginal Distributions and the Expected Value of Functions of
Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Special Theorems. . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Covariance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Conditional Probability and Density Functions. . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
Random
. . . . . .
. . . . . .
. . . . . .
. . . . . .
51
51
53
54
54
59
CONTENTS
4.4
4.5
4.6
4.7
. . . . . . . . . . . . .
of Random Variables.
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
61
63
67
68
68
69
70
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
71
71
73
74
74
74
75
75
79
79
80
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Introduction and Definitions.
1.1
Basic Definitions.
Definition. An experiment E is a procedure which can result in one or several outcomes. The set of all possible outcomes of an experiment is called the sample space S
(more commonly ). A generic outcome will be denoted by . An event is a subset of
the sample space. Events are usually denoted by upper case letters near the beginning
of the alphabet, like A, B, C. An event which consists of only one outcome is called a
simple (or elementary event); otherwise it is a compound event.
Examples.
(1) Toss a coin. Then S = {H, T }. A = {H} is a simple event. We can also write
A = {get a head}.
(2) Toss a die. Then S = {1, 2, 3, 4, 5, 6}. A = {2, 4, 6} is an event. We could also write
A = {get an even number}. Another event is B = {1, 3, 4, 6}. C = {6} is a simple
event.
(3) Toss two dice. Then
(1, 1)
(2, 1)
(3, 1)
S=
(4, 1)
(5, 1)
(6, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(5, 2)
(6, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(5, 3)
(6, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)
(5, 4)
(6, 4)
(1, 5)
(2, 5)
(3, 5)
(4, 5)
(5, 5)
(6, 5)
(1, 6)
(2, 6)
(3, 6)
.
(4, 6)
(5, 6)
(6, 6)
Assuming we can measure the rest angle to any degree of accuracy, then S =
[0, 360) and is uncountably infinite. Some examples of events are
A = [15.0, 85.0],
B = (145.6678, 279.5000],
C = {45.7}.
Combinations of Events.
P (Ai ).
P (i=1 Ai ) =
i=1
n
X
P (Ai ).
i=1
(2) If A B, then
(a) P (B \ A) = P (B) P (A),
(b) P (A) P (B).
(3) P (A B) = P (A) + P (B) P (A B).
Proof.
(1) We have S = A Ac disjoint, so 1 = P (S) = P (A Ac ) = P (A) + P (Ac ).
(2) We have B = A (B \ A) disjoint, so P (B) = P (A) + P (B \ A).
(3) We have A B = A [B \ (A B)] disjoint, so P (A B) = P (A) + P [B \ (A B)] =
P (A) + P (B) P (A B).
A
B
Problem. Show that
P (A B C) = P (A) + P (B) + P (C) P (A B) P (A C) P (B C) + P (A B C).
Definition. A finite sample space is said to be equiprobable if every outcome has the
same probability of occurring.
Proposition 1.1.3. Let A be an event in an equiprobable sample space S. Then
P (A) =
|A|
,
|S|
Example. Two balanced dice are rolled. What is the probability of getting a sum of
seven?
Solution. {sum of 7} = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, so P [sum of 7] = 6/36.
10
Example. A tank has three red fish and two blue fish. Two fish are chosen at random
and without replacement. What is the probability of getting
(1) red fish first and then a blue fish?
(2) both fish red?
(3) one red fish and one blue fish?
Note: without replacementmeans a first fish is chosen from the five, and then a
second fish is chosen from the remaining four. at randommeans every pair of fish
chosen in this way has the same probability.
Solution. List the sample space. Let the fish be R1 , R2 , R3 and B1 , B2 . Then
R1 R2
R R
1 3
S=
R
1 B1
R1 B2
R2 R1
R2 R3
R2 B1
R2 B2
R3 R1
R3 R2
R3 B1
R3 B2
B1 R1
B1 R2
B1 R3
B1 B2
B2 R1
B2 R2
B2 R3
B2 B1
R1 , R2 R2 , R3 R3 , B1 B1 , B2
R ,R R ,B R ,B
1
3
2
1
3
2
S=
R
,
B
R
,
B
1
1
2
2
R1 , B2
1.2
11
When we used the method of listing the sample space, we didnt need to know the exact
form of an event, just the number of outcomes in the event.
Basic Principle of Counting. Suppose there are two operations op1 and op2. If op1
can be done in m ways, and op2 can be done in n ways, then the combined operation
(op1,op2) can be done in mn ways.
Example. Suppose there are two types B1 and B2 of bread, and three types F1 , F2 , F3 of
filling. How many types of sandwich can be made?
Solution. Operation 1 (choose the bread) can be done in 2 ways, and operation 2
(choose the filling) in 3 ways. so the combined operation (make a sandwich) can be
done in 3 2 = 6 ways. The resulting sandwiches are
B1 F1 B2 F1
B1 F2 B2 F2
B1 F3 B2 F3
More generally, if there are k operations, of which the first can be done in m1 ways,
the second in m2 ways,. . . , and the kth in mk ways, then the combined operation can
be done in m1 m2 mk ways.
Example. How many three letter words can be formed from the letters a,b,c,d,e if
(1) each letter can only be used once?
(2) each letter can be used more than once?
12
Factorial Notation.
We also define 0! = 1.
Example. 1! = 1, 2! = 2, 3! = 6, 4! = 24, etc.
Permutations.
time, is
Prn = n (n 1) (n 2) (n r + 1) =
n!
.
(n r )!
Example. P24 = 12, P35 = 60. Pnn = n! is the number of ways in which n objects can be
ordered.
Combinations. The number of combinations of n objects taken r at a time (i.e. number of ways you can choose r objects from n objects) is
Crn =
(Crn is also denoted by
C0n =
n
r
n!
.
r !(n r )!
.) Note that
n!
= 1,
0!n!
Cnn =
n!
= 1,
n!0!
n
= n.
Cn1
C1n = n,
Example.
C24 =
4!
= 6,
2!2!
C35 =
5!
= 10.
3!2!
A1 A3
A1 A4
A2 A3
A2 A4
A3 A4
A2 A1
A3 A1
A4 A1
A3 A2
A4 A2
A4 A3
which are all the permutations of these 4 objects taken two at a time. That is, P24 = 2!C24 .
In general, we have Prn = r !Crn , which is how the formula for Crn is obtained.
13
Example. In how many ways can a committee of 3 be chosen from a club of 6 people?
Solution. C36 = 20.
Example. (No. 2.166, 7th ed.) Eight tires of different brands are ranked from 1 to 8
(best to worst). In how many ways can four of the tires be chosen so that the best tire
in the sample is actually ranked third among the eight?
Solution. Identify the tires by their rankings. Among the four tires, one must be tire
3, and the other three must be chosen from tires 4, 5, 6, 7, 8. This latter can be done in
5
= 10 ways, so the answer is 10.
3
Example. A club consists of 9 people, of which 4 are men and 5 are women. In how
many ways can a committee of 5 be chosen, if it is to consist of 3 women and 2 men?
Solution. Let m1 =number of ways to choose the women=C35 , and m2 =number of
ways to choose the men= C24 . Then the number of ways to choose the committee is
C35 C24 = 10 6 = 60.
Example. A box contains 9 pieces of fruit, of which 4 are bananas and 5 are peaches.
A sample of 5 pieces of fruit is chosen at random. What is the probability this sample
will contain
(1) exactly 2 bananas and 3 peaches?
(2) no bananas?
(3) more peaches than bananas?
Solution.
(1) The
number
of ways of choosing a sample consisting of 2 bananas
and 3 peaches
4
5
9
is 2 3 . The number of ways of choosing a sample of 5 is 5 . Hence the answer
is
4
2
5
3
.
9
5
(2)
4
0
1
5 = .
9
9
5
14
(3) Let B be the number of bananas in the sample and P the number of peaches. Then
P [B < P ] = P [B = 0, P = 5]+P [B = 1, P = 4]+P [B = 2, P = 3] =
4
0
5
5
4
1
5
4
def
n!
.
n1 !n2 ! nk !
Proof. Let operation 1 be to choose n1 objects for the first group,.. . ,operation k be to
choose nk objects for the kth group. Operation 1 can be done in nn1 ways. Operation
1
2 can be done in nn
ways, and so on. Then the combined operation can be done in
n2
n
n1
5
3
+ + .
9
9
9
5
n
n1 , n2 , , nk
4
2
!
!
n n1
n n1 n2 nk1
n!
=
n2
nk
n1 !n2 ! nk !
15
Solution.
(1) Send
the taxi that needs repair to C. The remaining 8 taxis can be dispatched in
8
8!
= 56 ways.
= 3!5!
3,5
(2) The taxis needing
repair can be assigned in 3! ways. The remaining six taxis can
6
6!
= 15 ways. so the answer is 6 15 = 90 ways.
be assigned in 2,4
= 2!4!
Here is a second solution of the red fish, blue fish example.
Example. A tank has three red fish and two blue fish. Two fish are chosen at random
and without replacement. What is the probability of getting
(1) red fish first and then a blue fish?
(2) both fish red?
(3) one red fish and one blue fish?
Solution. For (3), we have P [1 red,1 blue] = P [ red first, bluesecond]+P [ blue first, red second].
1.3
Suppose a balanced die is tossed in the next room. We are told that a number less than
4 was observed. What is the probability the number was either 1 or 2?
Let
A = {1, 2},
B = {1, 2, 3}.
Then, what is the probability of A given that event B has occurred? This is denoted
by P (A|B). The answer is that if we know that B has occurred, then the sample space
reduces to S 0 = B, and so P (A|B) = two chances in three= 2/3. Now notice that
P (A|B) =
2/6
P (A B)
2
=
=
.
3
3/6
P (B)
P (A B)
.
P (B)
The vertical slash is read as given. Note that P (A|B) P (B|A) in general (in fact, they
are equal iff P (A) = P (B)).
16
Example. Toss two balanced dice. Let A = {sum of 5} and B = {first die is 2}. Then
A B = {(1, 4), (2, 3)}, B = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3),
(2, 4), (2, 5), (2, 6)}, so
P (A|B) =
2/36
P (A B)
2
=
=
.
12
12/36
P (B)
Example. Two balanced dice are tossed. What is the probability that the first die gives
a number less than three, given that the sum is odd?
Solution. Let A = {first die less than 3} and B = {sum is odd}. Then
A B = {(1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5)},
so
P (A|B) =
The Multiplicative Rule.
P (A B)
6/36
12
=
=
.
P (B)
1/2
36
This is
P (A B) = P (A|B)P (B)
The following is a third way of doing the red fish, blue fish example.
Example. A tank has three red fish and two blue fish. Two fish are chosen at random
and without replacement. What is the probability of getting
(1) red fish first and then a blue fish?
(2) both fish red?
(3) one red fish and one blue fish?
Solution.
P [red first and blue second] = P [{red first} {blue second}] = P (blue second|red first)P (red first)
2 3
6
.
= =
4 5
20
P [both red] = P [{red first} {red second}] = P (red second|red first)P (red first) =
2 3
6
=
.
4 5
20
P [blue first and red second] = P [{blue first} {red second}] = P (red second|blue first)P (blue first)
3 2
6
= =
.
4 5
20
Hence P [one red, one blue] = P [{red first and blue second}{blue first and red second}] =
P [{red first and blue second}] + P [{blue first and red second}] = 12/20.
17
Example. Toss an unbalanced die with probs p(1) = .1, p(2) = .1, p(3) = .3, p(4) =
.2, p(5) = .1, p(6) = .2. Let A = { 5}, B = { 2}. Since A B = A, then
P (A|B) =
P (A)
.3
P (A B)
=
=
= 1/3.
P (B)
P (B)
.9
Example. Two balanced coins were tossed, and it is known that at least one was a
head. What is the probability that both were heads?
Solution. We have
P [both|at least one] =
Example. Two cards are drawn without replacement from a standard deck. Find the
probability that
(1) the second is an ace, given that the first is not an ace.
(2) the second is an ace.
(3) the first was an ace, given that the second is an ace.
Solution.
(1)
4
.
51
(2)
P [second an ace] = P [second an ace|first an ace]P [first an ace]+
P [second an ace|first not an ace]P [first not an ace]
3
4
4 48
4
=
=
.
51 52 51 52
52
(3)
P [first an ace, second an ace]
P [second an ace]
P [second an ace|first an ace]P [first an ace]
=
P [second an ace]
3
51
4
52
4
52
3
.
51
Example. The numbers 1 to 5 are written on five slips of paper and placed in a hat.
two slips are drawn at random without replacement. What is the probability that the
first number is 3, given a sum of seven?
18
Solution. Let A = {first a three} = {(3, 1), (3, 2), (3, 4), (3, 5)}, B = {sum of seven} =
{(2, 5), (3, 4), (4, 3), (5, 2)}. Since A B = {(3, 4)}, and the sample space has 20 outcomes, then
P [first a three|sum of seven] = P (A|B) =
1/20
1
P (A B)
=
= .
P (B)
4/20
4
Example. A card is selected at random (i.e. every card has the same probability of
being chosen) from a deck of 52. What is the probability it is a red card or a face card?
Solution. R = {red card}, F = {face card}. Then P (R F ) = P (R) + P (F ) P (R F ) =
26
6
+ 12
52
= 32
.
52
52
52
Proposition 1.3.1 (Properties of Conditional Probability). Fix an event B with P (B) > 0.
Then
(1) P (S|B) = 1, P (|B) = 0.
(2) P (B|B) = 1.
(3) P (Ac |B) = 1 P (A|B). (Ac =complement of A)
(4) P (C D|B) = P (C|B) + P (D|B) if C D = .
Proof. We have P (S|B) =
P (CD|B) =
P (SB)
P (B)
P [(C D) B]
P [(C B) (D B]
P (C B) + P (D B]
=
=
= P (C|B)+P (D|B).
P (B)
P (B)
P (B)
Remark. Fix an event B with P (B) > 0, and for any event A, define Q(A) = P (A|B).
Then Q is a probability.
Proposition 1.3.2. The following are equivalent statements:
(1) P (B|A) = P (B)
(2) P (A|B) = P (A).
(3) P (A B) = P (A)P (B).
19
Definition. Two events A and B are called independent if any one (and therefore all) of
the above conditions holds. We will actually take as our definition the third statement.
Definition.
Problem. Show that if A and B are independent, then so are (i) Ac and B, (ii) Ac and
Bc .
Example. Suppose Susan and Georges are writing the 323 exam. The probability that
Susan will pass is .70, and the probability that Georges will pass is .60. What is the
probability that (i) both will pass (ii) at least one will pass?
Example. Suppose an unbalanced die with probs p(1) = .1, p(2) = .1, p(3) = .3, p(4) =
.2, p(5) = .1, p(6) = .2 is tossed twice. What is the probability of getting
(1) (3, 2) (i.e. a 3 on the first toss and a 2 on the second)?
(2) A sum of four?
20
1.4
Definition.
n
X
i=1
P (A Bi ) =
n
X
i=1
P (A|Bi )P (Bi ).
21
(2)
P (Bk |A) =
P (A Bk )
P (A|Bk )P (Bk )
=
,
P (A)
P (A)
B1
B2
B3
.05 .2
.01
1
P (C|F 1)P (F 1)
=
=
=
.
P (C|F 1)P (F 1) + P (C|F 2)P (F 2) + P (C|F 3)P (F 3)
.19
.19
19
22
Chapter 2
Discrete Random Variables.
Definition. Let S be a sample space. A random variable (rv) X on S is a function
X : S R. Let RX denote the range of X. X is called a discrete random variable if RX is
a countable set. In this chapter, we deal with discrete random variables.
2.1
Basic Definitions.
Example. Suppose a coin is tossed three times. Let X be the number of heads observed. The sample space is
HHH
- 3
HHT - 2
HT H
- 2
HT T
- 1
S=
T HH
- 2
T HT
- 1
TTH
- 1
TTT
- 0
That is, we have X(HHH) = 3, X(HHT ) = 2, X(HT H) = 2, and so on. Hence
RX = {0, 1, 2, 3}.
Definition.
x RX
xA
24
1
,
8
3
,
8
3
P [X = 2] = P [HHT , HT H, T HH] = ,
8
1
P [X = 3] = P [HHH] = .
8
P [X = 1] = P [HT T , T HT , T T H] =
Definition.
fX (x)
1
8
3
8
3
8
1
8
xRX
This is also called the expectation of X, or the mean of X. E(X) is frequently denoted
by X .
Example. For the rv X of the previous example, we have
E(X) = (0
1
3
3
1
) + (1 ) + (2 ) + (3 ) = 1.5.
8
8
8
8
RX
g(X)
25
yRY xg 1 (y)
yRY xg 1 (y)
yP [X g 1 (y)] =
yRY
yRY
fX (x)
xg 1 (y)
yP [Y = y] = E(Y ).
yRY
Examples.
(1) For the rv X of the previous two examples, we have
2
E(X ) =
3
X
x 2 fX (x) = (0
x=0
(2) If g(x) =
5x
,
x 2 +1
then g(X) =
1
3
3
1
) + (1 ) + (4 ) + (9 ) = 3.
8
8
8
8
5X
.
X 2 +1
xRX
g1 (x)fX (x) +
xRX
g2 (x)fX (x)
xRX
where
= X = E(X). We also denote Var(X) by X2 . The positive square root X =
p
Var(X) is called the standard deviation of X.
26
(x )2 fX (x) =
xRX
(x 2 2x + 2 )fX (x)
xRX
x 2 fX (x) +
xRX
(2x)fX (x) +
xRX
2 fX (x)
xRX
2
For large n, we have ni pi . (After all, that is how we would determine pi .) Hence for
large n, we would have
average payoff per trial p1 x1 + p2 x2 + + pm xm .
In terms of X, this is
E(X) =
m
X
xi P [X = xi ].
i=1
So we think of E(X) as the average value of X if the experiment were repeated a large
number of times.
2.2
27
Definition. A rv X that can take only two values (usually 0 and 1 or 1 and 1) is said
to be a Bernoulli rv.
2.2.1
Suppose we have an experiment with only two outcomes, S (success) and F (failure),
with probabilities p and q respectively (Note that p + q = 1). For example,
(1) toss a coin
(2) roll a balanced die. Successmight mean getting a six, and failureanything else,
so that p = 1/6 and q = 5/6.
Each time this experiment is performed, it is called a trial (specifically a Bernoulli trial,
because there are only two outcomes). The experiment is performed n times in such a
way that whatevever happens on any one trial is independent of what happens on any
other trial. This is called having n independent trials. Let
X = the number of successes observed in the n trials.
Then X has range set RX = {0, 1, 2, . . . , n}. X is called a binomial random variable. We
write X Bin(n, p).
Proposition 2.2.1. X has probability function given by
!
n x nx
P {X = x} =
p q
, x = 0, 1, 2, . . . , n,
x
where q = 1 p.
Proof. Let us look at the case n = 3. The sample space is
-
p3
SSS
- p 2 q
SSF
- p 2 q
SF S
- pq2
SF F
S=
,
F SS
- p 2 q
- pq2
F SF
- pq2
FFS
F F F -
q3
where for example
P [F F S] = P [{F on 1st trial} {F on 2nd trial} {S on 3rd trial}]
= P [{F on 1st trial}]P [{F on 2nd trial}]P [{S on 3rd trial}] = q2 p.
28
Note that the probability of an outcome depends only on the number of Ss and F s
in the outcome, not their order. So
P {X = 2} = P {SSF , SF S, F SS} = P (SSF ) + P (SF S) + P (F SS) = 3p 2 q.
More generally,
P {X = x} = (number of outcomes with x Ss and n x F s ) p x qnx = Cxn p x qnx .
Remark.
n
X
Cxn ax bnx .
x=0
Var(X) = npq.
Proof.
n
X
n
X
n!
(n 1)!
x nx
E(X) =
x
p q
= np
p x1 q(n1)(x1)
x!(n x)!
(x 1)![(n 1) (x 1)]!
x=1
x=1
= np
m
X
m!
p y qmy = np,
y![m
y]!
y=0
n
X
x(x 1)
x=2
= n(n 1)p 2
= n(n 1)p
n!
p x qnx
x!(n x)!
n
X
(n 2)!
p x2 q(n2)(x2)
(x
2)![(n
2)
(x
2)]!
x=2
m
X
m!
p y qmy = n(n 1)p 2 ,
y![m
y]!
y=0
where in the next to last equality, we made the changes y = x 2 and m = n 2. Then
E(X 2 ) = E[X(X 1) + X] = E[X(X 1)] + E(X) = n(n 1)p 2 + np, so
Var(X) = E(X 2 ) 2 = n(n 1)p 2 + np n2 p 2 = npq.
There are tables in the back of the textbook which give binomial probabilities. But
they only deal with a few values of n (from 5 to 25), and p.
29
Example. Exxon has just bought a large tract of land in northern Quebec, with the
hope of finding oil. Suppose they think that the probability that a test hole will result
in oil is .2. Assume that Exxon decides to drill 7 test holes. What is the probability that
(1) Exactly 3 of the test holes will strike oil?
(2) At most 2 of the test holes will strike oil?
(3) Between 3 and 5 (including 3 and 5) of the test holes will strike oil?
What are the mean and standard deviation of the number of test holes which strike oil.
Finally, how many test holes should be dug in order that the probability of at least one
striking oil is .9?
Solution. Let X =number of test holes that strike oil. Then X Bin(n = 7, p = .2).
(1) P {X = 3} = C37 (.2)3 (.8)4 = 35 (.2)3 (.8)4 = .115
(2) P {X 2} = P {X = 0} + P {X = 1} + P {X = 2} = C07 .20 .87 + C17 .21 .86 + C27 .22 .85 =
.87 + (7 .21 .86 ) + (21 .22 .85 ) = .852
(3) P {3 X 5} = .148 (using table II in appendix)
E(X) = 7 .2 = 1.4 and Var(X) = 7 .2 .8 = 1.12. For the last question, we have to
find n so that P {X 1} = .9 or more. This is the same as P {X = 0} = .1 or less. But
P }X = 0} = .8n . Hence we have to find n so that .8n = .1 or less. Since
n
.8n
8
.167
9
.134
10
.107
11
.086
2.2.2
1
p
and Var(Y ) =
q
.
p2
30
ypqy1 = p
y=1
yqy1 = p
y=1
X y
1
p
1
q =p
=
=
q y=0
q 1 q
(1 q)2
p
P [Y > m + n]
qm+n
P [Y > m + n, Y > m]
=
=
= P [Y > n].
P [Y > m]
P [Y > m]
qm
For the converse, assume (2.1) holds, and let g(y) = P [Y > y]. Then g(m + n) =
g(m)g(n) for all m, n 1. This forces g(y) = g(1)y for all y 1. Putting q = g(1)
and p = 1 g(1) gives P [Y = y] = P [Y > y 1] P [Y > y] = qy1 qy = pqy1 .
2.2.3
r
p
and Var(Y ) =
rq
.
p2
31
2.2.4
Suppose we have a box containing a total of N marbles, of which r are red and b are
black (so r , b 0 and r + b = N). A sample of size n is chosen randomly and without
replacement. Let Y be the number of red balls in the sample. Then Y has probability
function
P [Y = y] =
r
y
Nr
ny
N
n
0 y r, n y N r.
Proposition 2.2.6.
E(Y ) =
2.2.5
nr
N
and
Var(Y ) = n
r
N
N r
N
N n
.
N 1
Definition.
x e
, x = 0, 1, 2, . . .
(2.2)
x!
is said to have the Poisson distribution with parameter > 0. We write X Poisson().
P [X = x] =
Check. We did not derive this distribution. Hence we have to check that (2.2) really is
a probability function. But obviously P [X = x] 0, and
X
x=0
So ok.
P [X = x] =
X
X
x e
x
= e
= e e = 1.
x!
x!
x=0
x=0
32
2 e
2
3 e
= 2
, so = 32 . Then P [X = 4] =
(1.5)4 e1.5
24
= .04707.
Var(X) = .
Proof. We have
X
x e
x1
E(X) =
xP [X = x] =
x
= e
= .
x!
(x 1)!
x=0
x=1
x=1
To compute Var(X), we compute E[X(X 1)] and proceed as with the binomial.
Proposition 2.2.8. Suppose X Bin(n, p). Then
P [X = x]
x e
x!
Proof. We have
!
n x
n!
x
n(n 1) (n x + 1) x
p (1 p)nx =
x (1 )nx =
(1 )n (1 )x
x
x
x!(n x)! n
n
n
x!
n
n
= 1 (1
2
x1
x
x e
1
)(1 ) (1
)
(1 )n (1 )x
n
n
n
x!
n
n
x!
since (1 n )n e and (1 n )x 1.
Remark.
Thus, for large n and small p, we can approximate the binomial probability
x e
n
x
p (1 p)nx by x!
, where = np. This approximation is considered goodif
x
np 7.
Example. X Bin(n = 20, p = .05).
x
P[X=x](exact binomial)
Poisson Approx ( = 1)
2.3
.358
.368
.378
.368
.189
.184
.059
.061
.013
.015
33
Let X be a r.v. If there exists a > 0 such that E(etX ) < for all < t <
def
MX (t) = E(etX ),
< t <
is called the moment generating function (mgf) of X. For a discrete r.v., we have
X
MX (t) =
etx fX (x).
xRX
Examples.
(1) If X = c, then MX (t) =def E(etc ) = etc .
(2) If X Bin(n, p), then
MX (t) =
n
X
!
!
n
X
n x nx
n
p q
=
(pet )x qnx = (pet + q)n .
x
x
x=0
tx
x=0
x=1
(qet )x1 =
x=1
pet
< if qet < 1.
1 qet
qet < 1 is equivalent to t < log q1 , so we may take = log q1 > 0 (since p > 0).
(4) X Poisson(). Then
MX (t) =
tx
x=0
e
x!
=e
X
(et )x
t
t
= e ee = e(1e ) ,
x!
x=0
MX (0) = E(X n ), n = 0, 1, . . . .
Proof. MX (0) = E(e0 ) = E(1) = 1. From (2.3), we have
X
MX0 (t) =
xetx fX (x)
xRX
MX "(t) =
x 2 etx fX (x)
xRX
..
.
(n)
MX (t) =
x n etx fX (x),
xRX
(2.3)
34
Examples.
(1) X Bin(n, p). Then
MX0 (t) =
d
(pet + q)n = n(pet + q)n1 pet ,
dt
d pet
(1 qet )pet pet (qet )
=
,
dt 1 qet
(1 qet )2
1
.
p
fX (x)
.2
.3
.4
.1
Find the moment generating function of X and use it to calculate E(X) and Var(X).
Solution. MX (t) = .2 + .3et + .4e2t + .1e3t , so MX0 (t) = .3et + .8e2t + .3e3t and MX "(t) =
.3et + 1.6e2t + .9e3t . Then E(X) = MX0 (0) = 1.4, E(X 2 ) = MX "(0) = 2.8, and Var(X) =
E(X 2 ) [E(X)]2 = 2.8 1.42 = .84.
Remark.
tx
e fX (x) =
xRX
X
X
xRX
n=0
(tx)2 (tx)3
+
+
1 + tx +
2!
3!
X
(tx)n
t n E(X n )
fX (x) =
n!
n!
n=0
n=0 xRX
X
n0 t n
n!
#
fX (x)
Chapter 3
Continuous Random Variables.
3.1
Distribution Functions.
fX (x)
.2
.1
.2
.1
.2
.2
.2
.3
F (x) = .5
.6
.8
if < x < 1,
if 1 x < 2,
if 2 x < 3,
if 3 x < 4,
if 4 x < 5,
if 5 x < 6,
if 6 x < +,
36
Proposition 3.1.1. Every distribution function F (x) has the following properties:
(1) F is nondecreasing. i.e. if x y, then F (x) F (y),
(2) F (x) 0 as x and F (x) 1 as x +,
(3) F is continuous from above (from the right). i.e. F (y) F (x) as y x.
Proof. (1) If x y, then {X x} {X y}, so P {X x} P {X y}.
Remarks.
(1) Conversely, any function F : R [0, 1] with the above three properties is called a
distribution function. It can be shown that given any distribution function F , there
exists a probability space (S, P ) and on it a rv X which has F as its distribution
function.
(2) If X is any r.v. and if a < b, we have
P [a < X b] = F (b) F (a).
This is because {X b} = {X a} {a < X b} (disjoint), so P {X b} = P {X
a} + P {a < X b}.
(3) The figure below shows the distribution function of a continuous r.v., and of a
mixed (part discrete, part continuous) r.v.
3.2
37
Definition. Let X be a r.v. with distribution function F (x). If there exists a function
f : R R such that
Zx
F (x) =
f (t)dt, x R,
(3.1)
then X is called a continuous random variable with density function f . Note that if f is
continuous, then by the fundamental theorem of calculus, we also have F 0 (x) = f (x)
for all x.
Proposition 3.2.1. f has the properties:
(1) f (x) 0 for all x R,
R +
(2) f (x)dx = 1.
Proof. By the fundamental theorem of calculus, we have f (x) = F 0 (x) 0 since F is
nondecreasing. Also,
Zx
1 = lim F (x) = lim
x+
x+
Z +
f (t)dt =
f (x)dx.
Remarks.
(1) Conversely, any function f : R R with the above two properties is called a
density function.
(2) If f is a density function, then F defined by (3.1) is a distribution function, so
there exists a r.v. X having F as its distribution function and therefore f as its
density function.
Proposition 3.2.2. Let X be a continuous r.v. with density function f .
(1) If a < b, then
Zb
P [a < X b] =
f (x)dx.
a
Note that this is the area under the graph of f between a and b.
(2) P [X = x] = 0 for every x R.
Proof.
(1) We have
Zb
P [a < X b] = F (b) F (a) =
Za
f (x)dx
Zb
f (x)dx =
f (x)dx.
a
38
(2) If > 0,
Zx
P [X = x] P [x < X x] =
f (t)dt 0 as 0,
x
implying that P [X = x] = 0.
Remark.
R +
R +
|h(x)|dx < +.
Definition. Let X be a continuous r.v. with density function f (x). The expected value
(or mean, or expectation) of X is defined to be
Z +
E(X) =
xf (x)dx,
E[g(X)] =
Z +
[g1 (x) + g2 (x)]f (x)dx =
Z +
g1 (x)f (x)dx +
R +
R +
g2 (x)f (x)dx
39
Z +
= Var(X) = E[(X ) ] =
(x )2 f (x)dx,
where = E(X).
Once again, we have
Var(X) = E(X 2 ) 2 .
R +
R +
R +
This
is because (xR )2 f (x)dx = (x 2 2x + 2 )f (x)dx = x 2 f (x)dx
R +
+
2 xf (x)dx + 2 f (x)dx = E(X 2 ) 2 2 + 2 = E(X 2 ) 2 .
kx 2
Example. Suppose X has density function f (x) =
0
if 0 < x < 1,
otherwise.
Find
(1) k,
(2) the distribution function F (x),
(3) P [ 41 < X < 12 ],
(4) E(X),
(5) Var(X).
Solution.
(1) 1 =
R +
f (x)dx =
R0
0dx +
R1
0
kx 2 dx +
R +
1
0dx = k
R1
0
x 2 dx = k 13 , so k = 3.
Rx
(2) F (x) = f (t)dt. If x 0, then obviously F (x) = 0. If 0 < x < 1, then
R0
Rx
F (x) = 0dx + 0 3t 2 dt = x 3 . If 1 x < +, then F (x) = 1.
(3) P [ 41 < X < 12 ] = F ( 12 ) F ( 14 ) = ( 12 )3 ( 41 )3 =
(4) E(X) =
R +
xf (x)dx =
R1
0
7
.
64
x 3x 2 dx = 34 .
R +
R1
(5) E(X 2 ) = x 2 f (x)dx = 0 x 2 3x 2 dx =
3
3
( 34 )2 = 80
.
5
3
.
5
Proposition 3.2.5. Let X be a discrete or continuous r.v., and let a and b be constants.
Then
Var(aX + b) = a2 Var(X).
40
of course, for this mgf to exist, there has to be a > 0 such that the integral exists for
all t with < t < .
In the continuous case, the mgf generates moments exactly as in the discrete case.
Proposition 3.2.6.
(n)
MX (0) = E(X n ), n = 0, 1, . . . .
Proof. MX (0) = E(e0 ) = E(1) = 1. From (3.2), we have
Z +
0
xetx f (x)dx
MX (t) =
Z
+
MX "(t) =
x 2 etx f (x)dx
..
.
(n)
MX (t)
Z +
=
x n etx f (x)dx,
3.3
From the previous section, we know that if we specify a density function f (x), there
will exist a r.v. X having f (x) as its density function.
3.3.1
Definition.
0
1
f (x) = ba
41
a+b
2
(2) Var(X) =
(ba)2
,
12
F (x) =
if x < a,
if x > b.
xa
ba
if a x b,
dc
,
ba
etb eta
.
t(ba)
Proof.
Ra
Rb
R +
Rb
1
(1) E(X) = xf (x)dx + a xf (x)dx + b xf (x)dx = a x ba
dx =
h 2 i
1
x b
1 b2 a2
b+a
=
=
.
ba
(2) E(X 2 ) =
ba
Rb
a
x2
Rb
1
ba a
xdx =
1
dx
ba
Rb 2
1
1
x 3 b
1 b3 a3
x dx = ba
= ba
ba a
3 a
3
b2 +ab+a2
(b+a)2
(ba)2
=
4 = 12 .
3
h
b2 +ab+a2
,
3
so
Rx
(3) F
(x)
=
f (t)dt. Obviously, F (x) = 0 if x < a. If a x b, then F (x) =
Rx 1
xa
a ba dt = ba . If x > 1, then F (x) = area under f between and x=1.
(4) P [c X d] = F (d) F (c) =
(5) MX (t) =
etx
a ba dx
Rb
1
ba
etx b
a
t
da
ba
ca
ba
etb eta
.
t(ba)
dc
.
ba
42
3.3.2
Definition.
0
g(y) = 1 y/
e
if y 0,
if y > 0,
is said to have the exponential distribution with parameter > 0. We write Y Exp().
0
G(y) =
1 ey/
1
1t
if t <
if t
if y < 0,
if y 0,
Proof.
R
R0
R +
R + 1
(1) E(Y ) = yg(y)dy + 0 yg(y)dy = 0 y ey/ dy = 0 wew dw =
after an integration by parts.
R +
R
(2) E(Y 2 ) = 0 y 2 1 ey/ dy = 2 0 w 2 ew dw = 22 after an integration by parts.
R + ty
R + ty 1 y/
(4) MY (t) =
=
e e
dy
e g(y)dy
0
1
1 ey(1/t)
if t < ,
= (1/t) 0
= as given.
+
if t 1
R
1 +
0
ey(1/t) dy
43
1
1t
n
n=0 (t)
1
,
and MY (t) =
0
n
.
n!
n=0
0 tn
n
.
n!
By
P [Y > s + t]
e(s+t)/
P [Y > s + t, Y > s]
=
=
= et/ = P [Y > t].
P [Y > s]
P [Y > s]
es/
For the converse, assume (3.3) holds and let h(y) = P [Y > y], y > 0. Then h(s + t) =
h(s)h(t) for all s, t 0. This is Cauchys equation and forces h(y) = eay for all y 0.
Since h(y) 1 for all y, then a < 0.
Thus the exponential distribution is the continuous analog of the geometric distribution.
3.3.3
Definition.
The function
Z
() =
x 1 ex dx,
> 0,
(2) (1) = 1,
(3) ( + 1) = (), > 0,
(4) (n + 1) = n!, n = 0, 1, 2, . . .,
and
44
Definition.
0
f (x) =
1 x 1 ex/
()
if x 0,
if x > 0,
We write X
()
()
0
0
Z
=
w 1 ew dw = 1,
() 0
1
(1t)
if t < 1 ,
if t 1 .
R
0
45
(1)
Z +
1
E(X) =
xf (x)dx =
()
( + 1)+1
=
= .
()
Z
xx
1 x/
1
dx =
()
x (+1)1 ex/ dx
(2)
Z
Z
1
1
2
1 x/
E(X ) =
x f (x)dx =
x x
e
dx =
x (+2)1 ex/ dx
() 0
() 0
( + 2)+2
=
= ( + 1)2 = 2 + 2 2 .
()
2
Z +
() 0
=
= (1 t)
MX (t) =
()
0
3.3.4
This is the most important distribution of all. The reason is the Central Limit Theorem,
which we will see in chapter 6.
Definition.
1 x 2
1
e 2 ( ) ,
2
< x <
(3.4)
where R and > 0 is said to have a normal (or Gaussian) distribution with parameters and We write X N(, 2 ).
When plotted, the density function looks like
46
A r.v. Z with distribution N(0, 1) is said to have the standard normal distribution.
Its density function looks like
Check.
Z +
where I =
R +
0
Z +
I =
1
f (x)dx =
2
ey
y 2 /2
2 /2
Z +
e
1
2
) dx = 1
2
2
dy
z2 /2
Z + Z +
dz
y 2 /2
dy =
! Z
+
Z +
x
.)
y 2 /2 z2 /2
2
I,
Next,
Z + Z +
dydz =
0
e(y
=
0
r 2 /2
where u = r 2 /2, so
Remark.
r dr d =
2
R +
Z
re
r 2 /2
dr =
2
eu du
f (x)dx = 1.
1
( ) =
2
Z +
0
x 1/2 ex dx =
Z +
ew
2 /2
dw = 2
Z +
0
ew /2
dw = ,
2
X
,
2 +z 2 )/2
dydz
47
f (x) =
e
, < x <
|a| 2
Proof.
R + z 1 x 2
R z w 2 /2
X
1
dw
(1) P [Z z] = P [ z] = P [X + z] = 12 e 2 ( ) dx = 2
e
x
after the substitution w = .
(2) Similar to (1).
Proposition 3.3.7. If X N(, 2 ), then E(X) = and Var(X) = 2 .
Proof. First suppose Z N(0, 1). We have E(Z) = 0 either by odd symmetry, or
"
+ #
Z +
1
1
2 /2
2 /2
z
z
E(Z) =
ze
dz =
e
= 0.
2
2
z e
dz =
z e
dz =
w 2 e dw
E(Z ) =
2
2 0
2 2 0
2
3
= ( ) = 1.
2
(since ( + 1) = () and ( 12 ) = ).
X
Now let X N(, 2 ). Define Z = , so Z N(0, 1). Since conversely, X = Z +,
then E(X) = E( Z) + E() = E(Z) + = . Also, E(X 2 ) = E[ 2 Z 2 + 2 Z + 2 ] =
2 E(Z 2 ) + 2 E(Z) + E( 2 ) = 2 + 0 + 2 . Then Var(X) = E(X 2 ) 2 = 2 .
We could also have calculated the mean and variance here from the mgf of the
normal distribution, which is given in the next proposition.
Proposition 3.3.8. If X N(, 2 ), then
MX (t) = et+
2 t2
2
t R.
2 /2
2 t2
2
= et+
2 t2
2
48
.1
1.282
.05
1.645
z!
.025
1.96
.01
2.326
.005
2.576
4.5
X
4.26
<
<
]
3
3.3.5
49
Definition.
where > 0 and > 0, is said to have a beta distribution with parameters and . We
write X Beta(, ). Notice that Unif(0,1) = Beta(1, 1).
Check.
,
+
Var(X) =
( +
)2 (
+ + 1)
3.3.6
Definition.
1
1
,
1 + x2
< x < +,
We have
Z +
Z
+
1
1
2 + dx
2
2
= 1.
dx =
= tan1 (x)0 =
2
2
0 1+x
2
1 + x
0
Z +
Z +
1
1
1
1
=
y
dy
+
x
dx = + ,
1 + y2
1 + x2
0
0
where y = x. So E(X) does not exist.
50
2 1 , if x 0,
2
f (x) = 1+x
0
if x < 0
is said to have the Cauchy distribution on [0 + ). Show that f is a density function
and that E(X) = + (but we do not have the problem).
3.4
Chebychevs Inequality.
which gives the required result. The case where X is discrete is identical.
Proposition 3.4.2 (Chebychevs Inequality). Let X be a r.v.with finite mean and variance 2 . Then
2
P [|X | > ] 2 ,
for any > 0.
Proof. Applying the previous proposition, we have
P [|X | > ] = P [|X |2 > 2 ]
E(|X |2 )
2
=
.
2
2
Remark. Chebychevs inequality illustrates that the smaller the variance of X is, the
closer the value of X is likely to be to its mean .
Remark. Recall that if X = c is a constant r.v., then E(X) = c and Var(X) = 0. The
following shows that the converse is also true.
Proposition 3.4.3. Let X be a random variable with mean for which Var(X) = 0. Then
P [X = ] = 1.
Proof. By Chebychevs inequality, P [|X | > ] = 0 for every > 0. This can only be
true if P [|X | > 0] = 0.
Chapter 4
Multivariate Distributions.
4.1
Definitions.
Definition.
Let Y1 and Y2 be discrete r.v.s on the same sample space S. The function
f (y1 , y2 ) = P [Y1 = y1 , Y2 = y2 ],
< y1 , y2 < +
{Y1 = y1 , Y2 = y2 }.
(y1 ,y2 )A
Next, we want to define the continuous analog of a jpf. To do this, we need, just as
in the univariate situation, the idea of a joint distribution function.
Definition. Let Y1 and Y2 be (any kind of) r.v.s defined on the same sample space S.
The function
F (y1 , y2 ) = P [Y1 y1 , Y2 y2 ],
< y1 , y2 < +,
52
Proof. We have
{y1 < Y1 y1 , y2 < Y2 y2 } = {y1 < Y1 y1 , Y2 y2 } \ {y1 < Y1 y1 , Y2 y2 },
so
P [y1 < Y1 y1 , y2 < Y2 y2 ] = P [y1 < Y1 y1 , Y2 y2 ]P [y1 < Y1 y1 , Y2 y2 ].
Also, {y1 < Y1 y1 , Y2 y2 } = {Y1 y1 , Y2 y2 } \ {Y1 y1 , Y2 y2 }, so
P [y1 < Y1 y1 , Y2 y2 ] = P [Y1 y1 , Y2 y2 ] P [Y1 y1 , Y2 y2 ] = F (y1 , y2 )
F (y1 , y2 ). Similarly, P [y1 < Y1 y1 , Y2 y2 ] = F (y1 , y2 ) F (y1 , y2 ).
Proposition 4.1.4 (Properties of a JDF). Let F (y1 , y2 ) be the JDF of Y1 and Y2 . Then
(1) F (, ) = F (, y2 ) = F (y1 , ) = 0, F (+, +) = 1,
(2) If y1 y1 and y2 y2 , then
F (y1 , y2 ) F (y1 , y2 ) F (y1 , y2 ) + F (y1 , y2 ) 0.
Remark.
Remark. Conversely, given a function F (y1 , y2 ) satisfying conditions (1) and (2), we
can find a probability space (S, P ) and on it r.v.s Y1 and Y2 having F (y1 , y2 ) as their
JDF.
Definition. Let Y1 and Y2 be r.v.s defined on the sample space S, with JDF F (y1 , y2 ).
If there exists a function f (y1 , y2 ) such that
Z y1 Z y2
F (y1 , y2 ) =
f (t1 , t2 )dt2 dt1 , < y1 , y2 < +,
(4.2)
we say that Y1 and Y2 are jointly continuous with joint density function (jdf) f (y1 , y2 ).
Proposition 4.1.5. Let Y1 and Y2 be jointly continuous r.v.s with jdf f (y1 , y2 ). Then
(1) f (y1 , y2 ) 0 for all y1 , y2 R,
R + R +
(2) f (y1 , y2 )dy1 dy2 = 1.
Proposition 4.1.6. Let Y1 and Y2 be jointly continuous r.v.s with jdf f (y1 , y2 ). Then
Z b2 Z b1
P [a1 < Y1 b1 , a2 < Y2 b2 ] =
a2
a1
(4.3)
(4.4)
4.2. MARGINAL DISTRIBUTIONS AND THE EXPECTED VALUE OF FUNCTIONS OF RANDOM VARIABLES
a2
a1
Bu from (4.1), the LHS here is P [a1 < Y1 b1 , a2 < Y2 b2 ]. Thus, (4.3) is verified.
Note that if A in (4.4) is taken to be the rectangle [a1 , b1 ] [a2 , b2 ], then (4.3) results.
So (4.4) is true when A is a rectangle.
4.2
Z +
Y2 is continuous with df f2 (y2 ) =
f (y1 , y2 )dy1 .
f1 (y1 ) and f2 (y2 ) are called the marginal density functions of Y1 and Y2 respectively.
Proof.
P
(1) Since {Y1 = y1P
} = y2 RY2 {Y1 = y1 , Y2 = y2 }, then P [Y1 = y1 ] = y2 P [Y1 =
y1 , Y2 = y2 ] = y2 f (y1 , y2 ).
o
R y1 nR +
(2) P [Y1 y1 ] = P [Y1 y1 , < Y2 < +] =
f
(t
,
t
)dt
dt1 . By defi1
2
2
R +
nition, Y1 is continuous with df f (y1 , y2 )dy2 . Here, we applied (4.4) with A
taken to be the rectangle A = (, y1 ] (, +).
54
Definition.
g(Y1 , Y2 ).
(1) If Y1 and Y2 are discrete with joint probability function f (y1 , y2 ), then the expected value of g(Y1 , Y2 ) is
E[g(Y1 , Y2 )] =
X X
g(y1 , y2 )f (y1 , y2 ).
all y2 all y1
(2) If Y1 and Y2 are jointly continuous with joint density function f (y1 , y2 ), then the
expected value of g(Y1 , Y2 ) is
Z + Z +
E[g(Y1 , Y2 )] =
4.2.1
Special Theorems.
4.2.2
Covariance.
Definition.
is called the covariance between Y1 and Y2 . If 12 and 22 denote the variances of Y1 and
Y2 , then
Cov(Y1 , Y2 )
=
1 2
is called the correlation coefficient between Y1 and Y2 . Y1 and Y2 are called uncorrelated
if Cov(Y1 , Y2 ) = 0 (or = 0).
4.2. MARGINAL DISTRIBUTIONS AND THE EXPECTED VALUE OF FUNCTIONS OF RANDOM VARIABLES
Remarks. Cov(Y1 , Y2 ) = Cov(Y2 , Y1 ). If Y1 = Y2 , then Cov(Y1 , Y2 ) = Var(Y1 ). If Y2 is
constant, then Cov(Y1 , Y2 ) = 0.
Proposition 4.2.3. Cov(Y1 , Y2 ) = E(Y1 Y2 ) 1 2 .
Proof. We have
E[(Y1 1 )(Y2 2 )] = E[Y1 Y2 2 Y1 1 Y2 + 1 2 ] = E(Y1 Y2 ) + E(2 Y1 ) + E(1 Y2 ) + E(1 2 )
= E(Y1 Y2 ) 2 E(Y1 ) 1 E(Y2 ) + 1 2 = E(Y1 Y2 ) 1 2
Example. Random variables X and Y have joint probability function given by the following table:
-1
1
2
-2
.10
.05
.20
x
0
.25
.15
.10
3
.10
.05
Find:
(1) the marginals,
(2) P {X Y },
(3) E(X 2 Y ),
(4) Cov(X, Y ),
(5) the correlation coefficient XY .
(6) the moment generating function of Y .
Solution.
(1)
x
-2
-1
g(x)
.35
.5
.15
h(y)
.45
.25
.30
P
(2) Let A = {(x, y) : x y}. Then P [X Y ] = P [(X, Y ) A] = (x,y)A f (x, y) =
f (2, 1) + f (2, 1) + f (2, 2) + f (0, 1) + f (0, 2) = .6.
P P
(3) E(X 2 Y ) = x y x 2 yf (x, y) = (4.1)+(9.1)+(4.05)+(9.05)+(8.2) =
0.95.
56
P P
(4) E(XY ) = x y xyf (x, y) = (2.1)+(3.1)+(2.05)+(3.05)+(4.2) =
.85. Also, E(X) = .25 and E(Y ) = .4, so Cov(X, Y ) = E(XY ) E(X)E(Y ) = .95.
(5) E(X 2 ) = 2.75, so Var(X) = 2.75 (.25)2 = 2.68754. Also, E(Y 2 ) = 1.9, so
Var(Y ) = 1.74. Hence
XY =
.95
= .4393.
2.68754 1.74
Remark. Let Y1 and Y2 be jointly continuous with jdf f (y1 , y2 ), and let R = {(y1 , y2 )
R2 : f (y1 , y2 ) > 0}. Then for any A R2 , we have
ZZ
A
ZZ
f (y1 , y2 )dy1 dy2 =
AR
RR
RR
2
Reason.
If
A,
B
R
and
AB
=
,
then
f
(y
,
y
)dy
dy
=
1
2
1
2
AB
A f (y1 , y2 )dy1 dy2 +
RR
f
(y
,
y
)dy
dy
.
If
R
=
{(y
,
y
)
:
f
(y
,
y
)
>
0},
then
for
any A R2 , we
1
2
1
2
1 RR2
1
2
B
RR
c
have
A = (A R) (A
RR R ), so A f (y1 , y2 )dy1 dyRR2 = AR f (y1 , y2 )dy1 dy2 +
RR
AR c f (y1 , y2 )dy1 dy2 = AR f (y1 , y2 )dy1 dy2 + 0 = AR f (y1 , y2 )dy1 dy2 .
Example. Random variables X and Y are jointly continuous with joint density function
x + y,
f (x, y) =
0,
Find:
(1) the marginals,
(2) P {X 2 Y },
(3) E(X 2 Y ),
(4) Cov(X, Y ),
(5) the correlation coefficient XY .
(6) the moment generating function of Y .
4.2. MARGINAL DISTRIBUTIONS AND THE EXPECTED VALUE OF FUNCTIONS OF RANDOM VARIABLES
Solution.
Z +
0
fX (x) =
f (x, y)dy = R 1
(x + y)dy = [xy +
if x (0, 1),
y2
1 ]
2 0
=x+
1
2
if x (0, 1).
By symmetry, we have
0
fY (y) =
y +
if y (0, 1),
1
2
if y (0, 1).
fig. 4.3
0.6
0.4
0.2
- 0.2
0.2
0.4
0.6
0.8
1.0
1.2
(3) We have
2
Z + Z +
Z1Z1
x yf (x, y)dxdy =
x 2 y(x + y)dxdy
0 0
#
1 #
Z1" 4
Z1
3
x
x
y
y2
3
2 2
2
=
x y + x y dx dy =
y+
y dy =
+
dy
4
3
3
0
0
0
0 4
0
1
y2 y3
= 17 .
=
+
8
9 0
72
E[X Y ] =
Z 1 "Z 1
58
(4) We have
Z1Z1
Z + Z +
xy(x + y)dxdy
xyf (x, y)dxdy =
0 0
#
"
1 #
Z1
Z1
3
2
x
x
y2
y
2
2
2
x y + xy dx dy =
=
y+
y dy =
+
dy
3
2
2
0
0
0
0 3
0
1
y2 y3
= 1.
=
+
6
6 0
3
E[XY ] =
Z 1 "Z 1
We also have
Z1
E(X) =
0
1
7
x(x + )dx =
,
2
12
Z1
E(X ) =
x 2 (x +
5
7
and E(Y 2 ) = 12
. Hence X2 = Y2 =
and so also E(Y ) = 12
1
Cov(X, Y ) = E(XY ) (EX)(EY ) = 144
, so
5
12
1
5
)dx =
,
2
12
7 2
( 12
) =
XY
Cov(X, Y )
1
q
= q 144
=
= .
11
11
X Y
11
144
(5) MY (t) =
R1
0
ety (y + 12 )dy =
tet et +1
t2
144
11
.
144
Finally,
59
6y (1 y ) if y [0, 1],
2
2
2
f2 (y2 ) =
0
otherwise.
y1
1/2
1/2
y1
31
=
.
64
4.3
Definition.
(1) If Y1 and Y2 are discrete with jpf f (y1 , y2 ), if f2 (y2 ) is the marginal pf of Y2 , and
if y2 is such that f2 (y2 ) > 0, then
f12 (y1 |y2 ) = P [Y1 = y1 |Y2 = y2 ] =
P [Y1 = y1 , Y2 = y2 ]
f (y1 , y2 )
=
P [Y2 = y2 ]
f2 (y2 )
f (y1 , y2 )
f2 (y2 )
60
Remark. In either case, f12 (y1 |y2 ) is not defined for a y2 with f2 (y2 ) = 0. For a fixed
y2 for which f12 (y1 |y2 ) is defined, it is a probability function (density function) as a
function of y1 . This is because
X
y1
X f (y1 , y2 )
f2 (y2 )
y1
X
f2 (y2 )
1
f (y1 , y2 ) =
= 1,
f2 (y2 ) y1
f2 (y2 )
and
Z +
Z +
f12 (y1 |y2 )dy1 =
f (y1 , y2 )
1
dy1 =
f2 (y2 )
f2 (y2 )
Z +
f (y1 , y2 )dy1 =
f2 (y2 )
= 1.
f2 (y2 )
Definition. Let Y1 and Y2 be two r.v.s on the same sample space. Let y2 be such that
f2 (y2 ) > 0 so that the conditional probability (density) function f12 (y1 |y2 ) is defined.
Then
P
if Y1 is discrete,
y g(y1 )f12 (y1 |y2 )
E[g(Y1 )|Y2 = y2 ] = R +1
if Y1 is continuous,
g(y1 )f12 (y1 |y2 )dy1
is called the conditional expectation of g(Y1 ) given that Y2 = y2 .
Example. Random variables X and Y are jointly continuous with joint density function
0
if y (0, 1),
fY (y) =
y + 1 if y (0, 1).
2
Hence the conditional density function f (x|y) is undefined if y (0, 1). If y (0, 1),
then
x+y
f (x, y) 1 +y if x (0, 1),
f (x|y) =
= 2
0
fY (y)
otherwise.
In particular,
x + .5 if 0 < x < 1,
f (x|.5) =
0
otherwise,
so E(X|Y = .5) =
R1
0
x(x + 2 )dx =
7
.
12
61
Remark. To say that a point W is chosen at random from the interval [a, b] means
that the r.v. W is uniformly distributed on [a, b].
Example. A point X is chosen at random from the interval (0, 1). Given that X = x, a
second point Y is chosen at random from the interval (0, x). Find E[X|Y = 32 ].
Solution. We have
1
g(x) =
0
if 0 < x < 1,
otherwise,
f (y|x) =
Hence
f (x, y) = f (y|x)g(x) =
so
Z +
h(y) =
if 0 < y < x,
otherwise.
otherwise,
R
1 1 dx = log y
y x
f (x, y)dx =
0
if 0 < y < 1,
otherwise,
and then
h(y)
0 otherwise.
Finally,
Z1
E[X|Y = y] =
Z1
xf (x|y)dx =
x
y
and thus
E[X|Y =
4.4
1
y 1
dx =
x log y
log y
2
1
]=
.
3
3 log 32
62
Proposition 4.4.2. Two random variables Y1 and Y2 are independent iff F (y1 , y2 ) =
F1 (y1 )F2 (y2 ) for all y1 , y2 R. Here, F1 (y1 ) = F (y1 , +) = limy2 F (y1 , y2 ) is the
marginal distribution function of Y1 and F2 (y2 ) is the marginal distribution function of
Y2 .
Proposition 4.4.3. Two discrete random variables Y1 and Y2 with jpf f (y1 , y2 ) and
marginals f1 (y1 ) and f2 (y2 ) are independent iff f (y1 , y2 ) = f1 (y1 )f2 (y2 ) for all y1 , y2
R.
Two jointly continuous random variables Y1 and Y2 with jdf f (y1 , y2 ) and marginal
density funcions f1 (y1 ) and f2 (y2 ) are independent iff f (y1 , y2 ) = f1 (y1 )f2 (y2 ) for all
y1 , y2 R.
Problem. Show that if Y1 and Y2 are independent, then f12 (y1 |y2 ) = f1 (y1 ) and
E[g(Y1 )|Y2 = y2 ] = E[g(Y1 )].
Proposition 4.4.4. Let Y1 and Y2 be independent r.v.s and let g(Y1 ) and h(Y2 ) be functions only of Y1 and only of Y2 , respectively. Then
E[g(Y1 )h(Y2 )] = E[g(Y1 )]E[h(Y2 )],
provided these expectations exist. (Recall that E(X) is said to exist if E(|X|) < .)
Proof. Consider the case where Y1 and Y2 are discrete. We have
X X
X X
E[g(Y1 )h(Y2 )] =
g(y1 )h(y2 )f (y1 , y2 ) =
g(y1 )h(y2 )f1 (y1 )f2 (y2 )
all y2 all y1
all y2 all y1
all y2
= E[g(Y1 )]
all y1
all y2
all y2
-1
0
1
-1
1
8
1
8
1
8
1
8
1
8
1
8
1
8
0
1
8
4.5. THE EXPECTED VALUE AND VARIANCE OF LINEAR FUNCTIONS OF RANDOM VARIABLES.63
We calculate E(XY ) = 0. The
3
8
2
g(x) = P [X = x] = 8
3
8
marginals are
if x = 1,
if x = 0,
h(y) = P [Y = y] =
if x = 1,
8
2
8
3
8
if y = 1,
if y = 0,
if y = 1,
4.5
U=
a i Xi
and
V =
i=1
n
X
bj Yj .
j=1
Then
(1) E(U) =
Pm
i=1
ai E(Xi ).
Pm Pn
Proof.
(1) We have already seen this.
y
m X
n
X
Pm
i=1
ai bj (Xi ix )(Yj j ),
i=1 j=1
and so
Cov(U, V ) = E
m X
n
X
ai bj (Xi
ix )(Yj
i=1 j=1
m X
n
X
i=1 j=1
ai bj Cov(Xi , Yj ).
y
j )
m X
n
X
i=1 j=1
ai bj E[(Xi ix )(Yj j )]
64
m X
m
X
ai aj Cov(Xi , Xj ) =
XX
XX
ai aj Cov(Xi , Xj ) +
ai aj Cov(Xi , Xj )
i=1 j=1
m
X
a2i Var(Xi ) + 2
i=1
i,j:i=j
i,j:ij
XX
ai aj Cov(Xi , Xj )
i<j
Examples.
(1) Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z).
(2) Cov(aX, bY ) = abCov(X, Y ).
(3) Cov(3X 2Y , X + 5Y ) = Cov(3X 2Y , X) + Cov(3X 2Y , 5Y ) = Cov(3X, X) +
Cov(2Y , X)+Cov(3X, 5Y )+Cov(2Y , 5Y ) = 3Cov(X, X)+2Cov(Y , X)+15Cov(X, Y )
10Cov(Y , Y ) = 3Var(X) + 17Cov(X, Y ) 10Var(Y ).
Corollary 4.5.2. Suppose X1 , . . . , Xm are uncorrelated. Then
Var(U ) =
m
X
a2i Var(Xi ).
i=1
In particular, the variance of a sum of independent random variables is the sum of the
variances of the random variables.
Example. Let X and Y be independent random variables with distributions N(2, 3)
and N(3, 2) respectively. Let
Z = 2X + Y ,
W = X 3Y .
Find
Solution.
(1) We have MX (t) = e2t+
3t 2
2
Cov(Z,W )
Z W
= 0.
4.5. THE EXPECTED VALUE AND VARIANCE OF LINEAR FUNCTIONS OF RANDOM VARIABLES.65
Example. X and Y are jointly continuous with joint density
Let R = {(x, y) : 0 < x < +, 0 < y < 1}. Note that B (the triangle) is A R.
R + R +
(i) c must be chosen so that f (x, y)dxdy = 1. We have
Z + Z +
Z 1 Z +
1=
f (x, y)dxdy = c
(x + y)ex dxdy
0 0
"Z Z
#
Z Z
1
xex dxdy +
=c
0
yex dxdy
1
3
= c[1 + ] = c,
2
2
so c = 32 .
(ii) We have
0
if x < 0,
g(x) =
f (x, y)dy = R 1 2
2
1
x
dy = 3 ex (x + 2 ) if x 0,
0 3 (x + y)e
Z +
0
if y (0, 1),
h(y) =
f (x, y)dx = R + 2
2
x
0
3
Z +
Z1Zy
f (x, y)dxdy =
AR
2
3
= [5e1 ] = 0.226,
3
2
so
P [X Y ] = 1 P [X < Y ] = 0.774.
2
(x + y)ex dxd
3
66
(iv)
Z +
x =
Z +
2
1
2
x(x + )ex dx =
3
2
3
xg(x)dx =
0
Z +
2 x
x e
0
1
dx +
3
Z +
xex dx
1
5
2
= E(X 2 ) + E(X) = ,
3
3
3
where X Exp( = 1).
Z +
y =
2
yh(y)dy =
3
Z1
y(1 + y)dy =
0
5
.
9
#
"Z Z
Z 1 Z +
1 +
2
2
x 2 yex dxdy +
xy 2 ex dxdy
E(XY ) =
xy (x + y)ex dxdy =
3
3
0 0
0 0
0 0
" Z
#
Z
2 1 + 2 x
1 +
2
1
8
x
=
x e dx +
xe dx = [1 + ] = .
3 2 0
3 0
3
3
9
Z 1 Z +
Hence Cov(X, Y ) =
8
9
1
( 53 59 ) = 27
.
(v)
"Z
#
Z +
Z +
+
2
1
2
1
x 2 g(x)dx =
E(X 2 ) =
x 2 (x + )ex dx =
x 3 ex dx +
x 2 ex dx
3 0
2
3
2 0
0
1
14
2
(4) + (3) =
,
=
3
2
3
Z +
and
Z +
E(Y ) =
Hence x2 =
14
3
25
9
17
,
9
xy
2
y h(y)dy =
3
2
y2 =
7
18
25
81
Z1
y 2 (1 + y)dy =
13
,
162
7
.
18
so
1
27
Cov(X, Y )
=
=q q
= 0.095.
17
13
x y
9
162
(vi) We have
(x+y)ex
f (x, y) 1+y
(x|y) =
=
0
h(y)
Hence
0
E(X|Y = y) = R
if x 0, 0 < y < 1,
otherwise.
if y (0, 1),
x(x+y)ex
dx
1+y
2+y
1+y
if y (0, 1).
67
X
Y
.
X
Y
Then
X
0 Var(W ) = Var
X
X Y
2Cov
,
X Y
Y
+ Var
Y
2
= 1 2 2 + 2 = 1 2 ,
= 1 Var(W ) = 0 P
X
Y
= c = 1,
X
Y
and
= 1 P [aX + bY = c] = 1 where a and b have the same sign,
= 1 P [aX + bY = c] = 1 where a and b have opposite signs.
Remark.
4.6
x
n!
p 1
x1 !xk ! 1
pk k
if x1 , . . . , xk 0, x1 + + xk = n,
otherwise
68
Example. A large box of washers consists of 50% 41 " washers, 30% 18 " washers, and
20% 38 " washers. Ten washers are chosen at random, with replacement. What is the
probability of getting
(1) five 14 " washers, four 18 " washers, and one 38 " washer?
(2) exactly six 81 " washers?
(3) at most two kinds of washers among the chosen ones?
Solution. With obvious notation, we have
(1) P [X1 = 5, X2 = 4, X3 = 1] =
(2) P [X2 = 6] =
10!
.36 .74
6!4!
10!
.55 .34 .21
5!4!1!
= .064.
= .0368.
(3) Let A1 = {X1 = 0}, A2 = {X2 = 0}, A3 = {X3 = 0}, and A = {at most two kinds}.
Then A = A1 A2 A3 , so
P (A) = P (A1 ) + P (A2 ) + P (A3 ) P (A1 A2 ) P (A1 A3 ) P (A2 A3 ) + P (A1 A2 A3 )
= .510 + .710 + .810 .210 .310 .510 + 0 = .1356157.
4.7
4.7.1
Definition.
function
f (y1 , y2 , . . . , yn ) = P [Y1 = y1 , Y2 = y2 , . . . , Yn = yn ],
< y1 , y2 , . . . , yn < +
< y1 , y2 , . . . , yn < +,
69
we say that Y1 , Y2 , . . . , Yn are jointly continuous with joint density function (jdf) f (y1 , y2 , . . . , yn ).
Proposition 4.7.2. Let Y1 , . . . , Yn be jointly continuous r.v.s with jdf f (y1 , . . . , yn ). Then
(1) f (y1 , . . . , yn ) 0 for all y1 , . . . , yn R,
R +
R +
(2) f (y1 , . . . , yn )dy1 dyn = 1.
Proposition 4.7.3. Let Y1 , . . . , Yn be jointly continuous r.v.s with jdf f (y1 , . . . , yn ). If
A Rn , then
Z +
Z +
P [(Y1 , . . . , Yn ) A] =
4.7.2
f (y1 , y2 , y3 , y4 ).
y1
The pf of Y2 is
f2 (y2 ) =
XXX
f (y1 , y2 , y3 , y4 ).
y1 y3 y4
(2) Let Y1 , Y2 , Y3 , Y4 be jointly continuous r.v.s. with jdf f (y1 , y2 , y3 , y4 ). Then the
jdf of Y1 and Y3 is
Z + Z +
f13 (y1 , y3 ) =
f (y1 , y2 , y3 , y4 )dy2 dy4 .
f (y1 , y2 , y3 , y4 )dy1 .
The df of Y2 is
Z + Z + Z +
f2 (y2 ) =
70
Examples (conditional).
(1) Let Y1 , Y2 , Y3 , Y4 be r.v.s which are either discrete or jointly continuous with f (y1 , y2 , y3 , y4 ).
If y2 , y4 are fixed and such that f24 (y2 , y4 ) > 0, then
def
f (y1 , y2 , y3 , y4 )
f24 (y2 , y4 )
f (y1 , y2 , y3 , y4 )
P [Y1 = y1 , Y2 = y2 , Y3 = y3 , Y4 = y4 ]
=
= P [Y1 = y1 , Y3 = y3 |Y2 = y2 , Y4 = y4
f24 (y2 , y4 )
P [Y2 = y2 , Y4 = y4 ]
Proposition 4.7.4. Let Y1 , Y2 , . . . , Yn be r.v.s with joint probability (or density) function
f (y1 , y2 , . . . , yn ). Let f1 (y1 ), f2 (y2 ), . . . , fn (yn ) be the marginal probability (density)
functions of Y1 , Y2 , . . . , Yn respectively. Then Y1 , Y2 , . . . , Yn are independent iff
f (y1 , y2 , . . . , yn ) = f1 (y1 )f2 (y2 ) fn (yn )
for all y1 , y2 , . . . , yn R.
4.7.3
Definition.
(1) If Y1 , . . . , Yn are discrete with joint probability function f (y1 , . . . , yn ), then the
expected value of g(Y1 , . . . , Yn ) is
X
X X
E[g(Y1 , . . . , Yn )] =
g(y1 , y2 , . . . , yn )f (y1 , y2 , . . . , yn ).
all yn
all y2 all y1
(2) If Y1 , . . . , Yn are jointly continuous with joint density function f (y1 , . . . , yn ), then
the expected value of g(Y1 , . . . , Yn ) is
Z +
Z + Z +
E[g(Y1 , . . . , Yn )] =
Example of Conditional Expectations. Let Y1 , Y2 , Y3 , Y4 be r.v.s which are either discrete or jointly continuous with f (y1 , y2 , y3 , y4 ). If y2 , y4 are fixed and such that
f24 (y2 , y4 ) > 0, then
P P
if Y1 , Y2 , Y3 , Y4 are d
y
y g(y1 , y3 )f13|24 (y1 , y3 |y2 , y4 )
E[g(Y1 , Y3 )|Y2 = y2 , Y4 = y4 ] = R +1 R +3
if Y1 , Y2 , Y3 , Y4 are j
g(y1 , y3 )f13|24 (y1 , y3 |y2 , y4 )dy1 dy3
Chapter 5
Functions of Random Variables.
5.1
5.1.1
Proposition 5.1.1. Let X be a continuous r.v. with density function fX (x), and let y =
(x) be a differentiable function which is either strictly increasing or strictly decreasing
on {x|fX (x) > 0}. Define Y = (X). Then Y has density function
dx
.
fY (y) = fX (x)
dy
(or equivalently
d1 (y)
.)
fY (y) = fX (1 (y))
dy
Proof.
P [X 1 (y)]
P [Y y] = P [(X) y] =
P [X 1 (y)]
F (1 (y))
X
=
if is decreasing
1 FX (1 (y))
if is increasing
F 0 (1 (y)) d1 (y)
if is inc.
d
X
dy
P [Y y] =
fY (y) =
d1 (y)
0
1
dy
FX ( (y)) dy
if is dec.
fX (x) dx
dx
if is inc.
dy
,
=
=
f
(x)
X
fX (x) dx if is dec.
dy
dy
if is inc.,
if is dec.
72
1 ey/2
(1) Suppose Y has distribution function FY (y) =
0
if y 0,
if y < 0.
FY (Y ). Then X Unif(0, 1). This is true in general (see below).
. Define X =
Solution. Notice that the monotone assumption of the above proposition is not
satisfied. We have
Z y
p
p
p
1
2
2
ex /2 dx
P [Y y] = P [X y] = P [ y X y] = 2P [0 X y] = 2
2
0
(putting w = x 2 )
Zy
=2
0
1
dw
ew/2 =
2 w
2
Zy
0
u (0, 1).
(1) let U be uniformly distributed on (0, 1). Then the random variable X = F 1 (U) has
distribution function F .
(2) If F is continuous and X is a random variable with distribution function F , then
U = F (X) is uniform on (0, 1).
Proof. Not difficult.
5.1.2
73
(x, y)
= det
(u, v)
x
v
y
v
where the partial derivatives are continuous on S. Then U and V have joint density
function given by
(x, y)
fU,V (u, v) = fX,Y (x, y)|
|.
(u, v)
Proof. We have
ZZ
x
u
y
u
x
v
y
v
v u
= det
0 1
!
= v.
RU = RV = R. Then
2
ev
2 (u2 +1)/2
(x, y)
ex /2 ey /2
e(x +y
|v| =
|=
(u, v)
2
2
2
|v|, < u, v < .
2 )/2
|v| =
e(u
2 v 2 +v 2 )/2
|v|
74
Z +
fU,V (u, v)dv =
1
(u2
+ 1)
ev
2 (u2 +1)/2
Z0
|v|dv =
ev
2 (u2 +1)/2
Z +
(v)dv +
ev
, < u < .
5.2
5.2.1
Example. Suppose X and Y are independent Poisson random variables with parameters x and y . Find the distribution of Z = X + Y .
Solution. Note that RZ = {0, 1, 2, . . .}. For z RZ , we have {X + Y = z} = zx=0 {X =
x, Y = z x} pairwise disjoint, so
P [Z = z] =
z
X
P [X = x, Y = z x] =
x=0
z
X
z
X
P [X = x]P [Y = z x]
x=0
z
zx
zx
xx ex y e y
x
e(x +y ) X
y
=
=
z! x
x!
(z x)!
z!
x! (z x)!
x=0
x=0
!
z
e(x +y ) X z
e(x +y )
xx zx
=
=
(x + y )z .
y
x
z!
z!
x=0
Thus X + Y Poisson(x + y ).
5.2.2
Let X and Y have joint density function f (x, y). What is the distribution of Z = X + Y ?
Solution. With R = {(x, y) : x + y z}, we have
2 (u2 +1)/2
vdv
ZZ
75
Z + Z zy
P [Z z] = P [X + Y z] = P [(X, Y ) R] =
f (x, y)dxdy
f (x, y)dxdy =
R
Z + Z z
Z z Z +
Zz
=
f (w y, y)dwdy =
f (w y, y)dydw =
g(w)dw,
from which it follows that Z is a continuous r.v. with density g. That is, the density
function of Z is
Z +
fZ (z) =
f (z y, y)dy.
fX (z y)fY (y)dy.
2 zez
if x = y = ,
= x y
e y e x
if x y ,
x y
5.3
5.3.1
So far, the theory of moment generating functions has been dispersed throughout these
notes. Here, we present a summary of this theory.
76
Definition.
, then
MX (t) = E(etX ),
< t <
if X is continuous.
e f (x)dx
Proposition 5.3.1 (Properties of An Mgf).
(5.1)
(1) MX (0) = 1,
(0)
Example. Suppose X and Y are independent Poisson random variables with parameters x and y . Find the distribution of Z = X + Y .
Solution. We have
MZ (t) = MX (t)MY (t) = ex (e
t 1)
ey (e
t 1)
= e(x +y )(e
t 1)
77
1
1
1
1
2
(1 t)
(1 t)
(1 t)1 +2
2
Solution. X + Y m+n
.
x t+
2 t2
x
2
y t+
2 t2
y
2
=e
(x +y )t+
2 + 2 )t 2
(x
y
2
Solution. We have
MX (t) = MX1 (t) MXm (t) =
2
so X n
.
1 ++nm
1
(1 2t)
n1
2
1
(1 2t)
nm
2
1
(1 2t)
n1 ++nm
2
78
Chapter 6
Law of Large Numbers and the Central
Limit Theorem.
Definition. A sequence X1 , X2 , . . . of random variables, all defined on the sample space
S, is said to be i.i.d. (independent and identically distributed) if they all have the same
distribution function and every finite subset of them is independent.
6.1
write Yn Y , as n , if
P [|Yn Y | > ] 0 as n
for every > 0.
Proposition 6.1.1 (The (Weak) Law of Large Numbers). Let X1 , X2 , . . . be a sequence of
i.i.d. random variables with finite mean and finite variance 2 . Define Sn = X1 + X2 +
+ Xn for n 1. Then
Sn
n
as n . That is,
P [|
Sn
| > ] 0 as n
n
2
.
n
By Chebychev, we have
Sn
Var(Sn /n)
2
| > ]
=
0
n
2
n2
as n .
79
80
Remark.
Let
Suppose a coin with probability p of getting a head is tossed over and over.
1
Xi =
0
Sn
n
p as n . This is,
6.2
N(0, 1),
n
meaning
Z x z2 /2
e
Sn n
x
dz as n for all x R
P
n
2
Remark.
=
where X
(6.1)
X
x
/ n
Zx
ez /2
dz as n for all x R
2
X1 ++Xn
.
n
Problem. Forty-nine pieces of material are to be fitted together to form one large
section. If the error made in each piece is uniform on [ 81 , 18 ], and the errors are
independent, what is the approximate probability that the magnitude of the total error
1
exceeds 4 ?
81
Solution. Let Xi = the error in the ith piece, and S49 = X1 + + X49 be the total error.
Then
(2/8)2
1
E(Xi ) = 0, Var(Xi ) =
=
,
12
16 12
1
1
1
S49 n
1
Xi =
0
Xnp
npq
X np
x
npq
Zx
ez /2
dz as n for all x R
2
has approximately the distribution N(0, 1) for large n. The usual cri-
P (48.5 X 51.5) = P
X n d
N(0, 1) as n .
2n
82
1/8
0
x
1
The rectangular outline is the probability function p(x) of X Bin(3, 2 ). The smooth
curve is the density function of N(1.5, .75). Suppose we want P [1 X 2] (i.e. P [X =
1 or 2]. This is the shaded rectangular area. To approximate it with a normal area, we
should take the area under the normal density curve between 0.5 and 2.5 (rather than
between 1 and 2).
83
84
0.00
0.5000
0.4602
0.4207
0.3821
0.3446
0.3085
0.2743
0.2420
0.2119
0.1841
0.1587
0.1357
0.1151
0.0968
0.0808
0.0668
0.0548
0.0446
0.0359
0.0287
0.0228
0.0179
0.0139
0.0107
0.0082
0.0062
0.0047
0.0035
0.0026
0.0019
0.0013
0.0010
0.0007
0.0005
0.0003
0.0002
0.0002
0.0001
0.0001
0.0000
0.0000
0.01
0.4960
0.4562
0.4168
0.3783
0.3409
0.3050
0.2709
0.2389
0.2090
0.1814
0.1562
0.1335
0.1131
0.0951
0.0793
0.0655
0.0537
0.0436
0.0351
0.0281
0.0222
0.0174
0.0136
0.0104
0.0080
0.0060
0.0045
0.0034
0.0025
0.0018
0.0013
0.0009
0.0007
0.0005
0.0003
0.0002
0.0002
0.0001
0.0001
0.0000
0.0000
0.02
0.4920
0.4522
0.4129
0.3745
0.3372
0.3015
0.2676
0.2358
0.2061
0.1788
0.1539
0.1314
0.1112
0.0934
0.0778
0.0643
0.0526
0.0427
0.0344
0.0274
0.0217
0.0170
0.0132
0.0102
0.0078
0.0059
0.0044
0.0033
0.0024
0.0018
0.0013
0.0009
0.0006
0.0005
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.03
0.4880
0.4483
0.4090
0.3707
0.3336
0.2981
0.2643
0.2327
0.2033
0.1762
0.1515
0.1292
0.1093
0.0918
0.0764
0.0630
0.0516
0.0418
0.0336
0.0268
0.0212
0.0166
0.0129
0.0099
0.0075
0.0057
0.0043
0.0032
0.0023
0.0017
0.0012
0.0009
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.04
0.4840
0.4443
0.4052
0.3669
0.3300
0.2946
0.2611
0.2296
0.2005
0.1736
0.1492
0.1271
0.1075
0.0901
0.0749
0.0618
0.0505
0.0409
0.0329
0.0262
0.0207
0.0162
0.0125
0.0096
0.0073
0.0055
0.0041
0.0031
0.0023
0.0016
0.0012
0.0008
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.05
0.4801
0.4404
0.4013
0.3632
0.3264
0.2912
0.2578
0.2266
0.1977
0.1711
0.1469
0.1251
0.1056
0.0885
0.0735
0.0606
0.0495
0.0401
0.0322
0.0256
0.0202
0.0158
0.0122
0.0094
0.0071
0.0054
0.0040
0.0030
0.0022
0.0016
0.0011
0.0008
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.06
0.4761
0.4364
0.3974
0.3594
0.3228
0.2877
0.2546
0.2236
0.1949
0.1685
0.1446
0.1230
0.1038
0.0869
0.0721
0.0594
0.0485
0.0392
0.0314
0.0250
0.0197
0.0154
0.0119
0.0091
0.0069
0.0052
0.0039
0.0029
0.0021
0.0015
0.0011
0.0008
0.0006
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.07
0.4721
0.4325
0.3936
0.3557
0.3192
0.2843
0.2514
0.2206
0.1922
0.1660
0.1423
0.1210
0.1020
0.0853
0.0708
0.0582
0.0475
0.0384
0.0307
0.0244
0.0192
0.0150
0.0116
0.0089
0.0068
0.0051
0.0038
0.0028
0.0021
0.0015
0.0011
0.0008
0.0005
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.08
0.4681
0.4286
0.3897
0.3520
0.3156
0.2810
0.2483
0.2177
0.1894
0.1635
0.1401
0.1190
0.1003
0.0838
0.0694
0.0571
0.0465
0.0375
0.0301
0.0239
0.0188
0.0146
0.0113
0.0087
0.0066
0.0049
0.0037
0.0027
0.0020
0.0014
0.0010
0.0007
0.0005
0.0004
0.0003
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
0.09
0.4641
0.4247
0.3859
0.3483
0.3121
0.2776
0.2451
0.2148
0.1867
0.1611
0.1379
0.1170
0.0985
0.0823
0.0681
0.0559
0.0455
0.0367
0.0294
0.0233
0.0183
0.0143
0.0110
0.0084
0.0064
0.0048
0.0036
0.0026
0.0019
0.0014
0.0010
0.0007
0.0005
0.0003
0.0002
0.0002
0.0001
0.0001
0.0001
0.0000
0.0000
85
Area
86
0.00
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.2257
0.2580
0.2881
0.3159
0.3413
0.3643
0.3849
0.4032
0.4192
0.4332
0.4452
0.4554
0.4641
0.4713
0.4772
0.4821
0.4861
0.4893
0.4918
0.4938
0.4953
0.4965
0.4974
0.4981
0.4987
0.4990
0.4993
0.4995
0.4997
0.4998
0.4998
0.4999
0.4999
0.5000
0.5000
0.01
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.2291
0.2611
0.2910
0.3186
0.3438
0.3665
0.3869
0.4049
0.4207
0.4345
0.4463
0.4564
0.4649
0.4719
0.4778
0.4826
0.4864
0.4896
0.4920
0.4940
0.4955
0.4966
0.4975
0.4982
0.4987
0.4991
0.4993
0.4995
0.4997
0.4998
0.4998
0.4999
0.4999
0.5000
0.5000
0.02
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.2324
0.2642
0.2939
0.3212
0.3461
0.3686
0.3888
0.4066
0.4222
0.4357
0.4474
0.4573
0.4656
0.4726
0.4783
0.4830
0.4868
0.4898
0.4922
0.4941
0.4956
0.4967
0.4976
0.4982
0.4987
0.4991
0.4994
0.4995
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.03
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.2357
0.2673
0.2967
0.3238
0.3485
0.3708
0.3907
0.4082
0.4236
0.4370
0.4484
0.4582
0.4664
0.4732
0.4788
0.4834
0.4871
0.4901
0.4925
0.4943
0.4957
0.4968
0.4977
0.4983
0.4988
0.4991
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.04
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.2389
0.2704
0.2995
0.3264
0.3508
0.3729
0.3925
0.4099
0.4251
0.4382
0.4495
0.4591
0.4671
0.4738
0.4793
0.4838
0.4875
0.4904
0.4927
0.4945
0.4959
0.4969
0.4977
0.4984
0.4988
0.4992
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.05
0.0199
0.0596
0.0987
0.1368
0.1736
0.2088
0.2422
0.2734
0.3023
0.3289
0.3531
0.3749
0.3944
0.4115
0.4265
0.4394
0.4505
0.4599
0.4678
0.4744
0.4798
0.4842
0.4878
0.4906
0.4929
0.4946
0.4960
0.4970
0.4978
0.4984
0.4989
0.4992
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.06
0.0239
0.0636
0.1026
0.1406
0.1772
0.2123
0.2454
0.2764
0.3051
0.3315
0.3554
0.3770
0.3962
0.4131
0.4279
0.4406
0.4515
0.4608
0.4686
0.4750
0.4803
0.4846
0.4881
0.4909
0.4931
0.4948
0.4961
0.4971
0.4979
0.4985
0.4989
0.4992
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.07
0.0279
0.0675
0.1064
0.1443
0.1808
0.2157
0.2486
0.2794
0.3078
0.3340
0.3577
0.3790
0.3980
0.4147
0.4292
0.4418
0.4525
0.4616
0.4693
0.4756
0.4808
0.4850
0.4884
0.4911
0.4932
0.4949
0.4962
0.4972
0.4979
0.4985
0.4989
0.4992
0.4995
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.08
0.0319
0.0714
0.1103
0.1480
0.1844
0.2190
0.2517
0.2823
0.3106
0.3365
0.3599
0.3810
0.3997
0.4162
0.4306
0.4429
0.4535
0.4625
0.4699
0.4761
0.4812
0.4854
0.4887
0.4913
0.4934
0.4951
0.4963
0.4973
0.4980
0.4986
0.4990
0.4993
0.4995
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000
0.09
0.0359
0.0753
0.1141
0.1517
0.1879
0.2224
0.2549
0.2852
0.3133
0.3389
0.3621
0.3830
0.4015
0.4177
0.4319
0.4441
0.4545
0.4633
0.4706
0.4767
0.4817
0.4857
0.4890
0.4916
0.4936
0.4952
0.4964
0.4974
0.4981
0.4986
0.4990
0.4993
0.4995
0.4997
0.4998
0.4998
0.4999
0.4999
0.4999
0.5000
0.5000