Professional Documents
Culture Documents
Theorem 3.5.1. If X and Y have a discrete joint distribution with p.f. f , then the marginal
p.f. of X is
f1 (x) = f (x, y).
y
The notation y implies that the summation is over all values of y such that f (x, y) is
positive.
Recall the urn example in which an urn contains 4 green, 6 red and 10 black balls. Two
balls are drawn randomly and without replacement.
Let X count the number of green balls and Y count the number of reds. The (joint) probabil-
ity distribution for (X, Y ) is determined by counting the number of ways to draw 0 x 2
from 4 (reds) and 0 y 2 from 6 (greens) and the number of ways to draw 2 x y from
10, given that x + y 2. The joint probability function is
(4)(6)( 10 )
x y( 2xy
20
) , x, y {0, 1, 2} and x + y 2,
f (x, y) = (1)
2
0 otherwise.
The p.f. may also be presented by enumerating all possible values of (X, Y ) and their
probabilities. Table 1 illustrates.
Table 1: The joint probability function for the number of red (X) and green (Y ) balls from
the urn example.
y
x 0 1 2
0 .237 .316 .079
1 .210 .126 0
2 .032 0 0
The marginal distributions of X and Y can be obtained by summing along rows (yielding
the marginal p.f. of X) and along columns (yielding the marginal p.f. of Y ). The result is
displayed in Table 2.
STAT 421 Lecture Notes 53
Table 2: Joint and marginal probability functions for the number of red (X) and green (Y )
balls from the urn example.
y
x 0 1 2 f1 (x)
0 .237 .316 .079 .632
1 .210 .126 0 .336
2 .032 0 0 .032
f2 (y) .479 .442 .079 1
Theorem 3.5.2. If X and Y have a continuous joint distribution with p.d.f. f , then the
marginal p.f. f1 of X is
f1 (x) = f (x, y) dy for < x < .
The proof is based on recognizing that a probability involving X alone, say Pr(X x), is
equivalent to Pr(X x, Y < ). The condition that Y may take on any value y R leads
to the integral
x x
Pr(X x, Y < ) = f (x, y) dydx = f1 (x) dx.
Example 3.5.3 Suppose that X and Y have the following joint p.d.f.:
21 x2 y, for x2 y 1,
f (x) = 4
0, otherwise.
To find the marginals, the support of X and Y must be determined from the condition
x2 y 1. Obviously, the support is limited to y 1. Furthermore, 0 x2 0 y.
Finally, x2 y implies that y x y. The support is shown in red:
1.0
0.8
0.6
y
0.4
0.2
0.0
x
STAT 421 Lecture Notes 54
Then
1
21
f1 (x) = x2 y dy
4 x2
1
21 x2 y 2
=
4 2 x2
21 (x2 x4 ), for 1 x 1,
8
=
0, otherwise.
Theorem 3.5.3. Suppose that X is discrete and Y is continuous, and the joint p.f./p.d.f.
is f . Then, the marginal p.f. of X is
f1 (x) = f (x, y) dy, for all x.
Example 3.5.4 Suppose that X is discrete and Y is continuous, and the joint p.f./p.d.f. is
x1
xy , for x {1, 2, 3}, 0 < y < 1,
f (x, y) = 3
0, otherwise.
Its useful to establish some notation not used by DeGroot and Schervish.
Let IA denote an indicator function of the set A. The function is defined according to
1, if x A,
IA (x) =
0, if x A.
3
xy x1
f2 (y) =
x=1
3
1( )
= 1 + 2y + 3y 2 I(0,1) (y).
3
While it is theoretically possible to compute marginal distributions from the joint distribu-
tion, the reverse is sometimes false. The joint p.f./p.d.f.s of random variables X and Y
cannot be derived from the marginal distributions of X and Y unless X and Y are indepen-
dent random variables.
Definition 3.5.2. Random variables X and Y are independent if for every two sets of real
numbers A and B,
Pr(X A, Y B) = Pr(X A) Pr(Y B).
If X and Y are independent, then
Thus, when X and Y are independent, the joint cumulative distribution function of X and
Y can be constructed as the product of the univariate cumulative distribution functions.
Theorem 3.5.4. X and Y are independent if and only if F (x, y) = F1 (x)F2 (y) for all
real numbers x and y.
The next theorem is very useful for proving that two (or more) random variables are in-
dependent.
STAT 421 Lecture Notes 56
Theorem 3.5.5. Suppose that X and Y have a joint p.f./p.d.f. f . Then, X and Y are
independent if and only if
Simply factoring f does not necessarily yield the marginals f1 and f2 since h1 may dif-
fer from f1 by a multiplicative constant (and similarly, factoring may yield h2 (y) = cf2 (y)
for some c = 1).
A more intuitive definition of independent random variables is presented in the next section
on conditional probability, but looking ahead, it will be stated that X and Y are independent
discrete random variables if knowing that y is the realized value of Y does not change the
probability that X takes on any particular value. Mathematically, X and Y are independent
if and only if Pr(X = x|Y = y) = Pr(X = x) for all x and y. The definition extends
to continuous random variables by replacing the events {X = x} and {Y = y} with events
{X A} and {Y B} where A and B are sets such that Pr(X A) > 0 and Pr(Y B) > 0.
Example Consider discrete random variables X and Y counting the number of heads when
coins A and B are tossed at the same time. Coin B is fair, but A is not and yields
Pr(X = 0) = 1/2, Pr(X = 1) = 1/4 = Pr(X = 2). The marginal p.d.f.s are shown in
the margins of Table 3. The body of Table 3 gives the joint p.d.f. of (X, Y ). Each entry was
obtained by computing Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y).
2 2
Notice that x=0 Pr(X = x, Y = y) = Pr(Y = y) and y=0 Pr(X = x, Y = y) = Pr(X =
x).
STAT 421 Lecture Notes 57
Table 3: The joint and marginal p.d.fs of X and Y . Values in the body of the Table are
Pr(X = x, Y = y).
y
x 0 1 2 f1 (x)
0 1/8 1/4 1/8 1/2
1 1/16 1/8 1/16 1/4
2 1/16 1/8 1/16 1/4
f2 (y) 1/4 1/2 1/4
Example The data in Table 4 enumerates the outcomes of the Titanic passengers and crew.
Table 5 contains the proportion of all passengers and crew cross-classified into a particular
cell (e.g., Pr(Survived, First) = 203/2201 = .092).
The survivorship random variable with marginal p.d.f. Pr(Survived) = .323 and Pr(Died) =
.677 is not independent of the class random variable since product of the marginal prob-
abilities Pr(First) = .148 and Pr(Survived) = .323 is .148 .322 = .0478 = .092 =
Pr(Survived, First).
Example 3.5.9. Suppose that X and Y are independent random variables and
It might appear that X and Y are independent random variables since kx2 y 2 is easily fac-
tored as two functions, each of which depend only on one variable. However, the support of
(X, Y ) is not rectangular with edges parallel to the x- and y-axes, so there is no possibility
of defining the support of X without reference to Y .
Another view of this complication writes f using an indicator function to explicitly identify
the support:
f (x, y) = kx2 y 2 I{(r,s)|r2 +s2 1} (x, y).
There is no possibility of factoring I{(r,s)|r2 +s2 1} (x, y) as two indicator functions each of
which involves only one variable. Factorization could be accomplished if the support were
rectangular with edges parallel to the x- and y-axes, say
If this were true of f (x, y), then X and Y are independent. DeGroot and Schervish establish
lack of independence of X and Y by making the rectangular support argument. Then they
give Theorem 3.5.6. which states that the support must be rectangular with edges parallel
to the x- and y-axes for independence to hold.
Theorem 3.5.4. and 3.5.6. can be used to establish that X and Y are independent. Alter-
natively, write
f (x, y) = ke(x+2y) I{(r,s)|0r,0s} (x, y) = h1 (x)h2 (y).
STAT 421 Lecture Notes 59
where
The function h1 is a p.d.f. (it integrates to 1), but h2 is not, at least until k is replaced by 12 .