Marginal Distributions

STAT 421 Lecture Notes 52
3.5 Marginal Distributions

Definition 3.5.1 Suppose that X and Y have a joint distribution. The c.d.f. of X derived
by integrating (or summing) over the support of Y is called the marginal c.d.f. of X. The
p.f. or p.d.f. associated with the marginal c.d.f. is called the marginal p.f. (or p.d.f.)
Theorem 3.5.1. If X and Y have a discrete joint distribution with p.f. f , then the marginal
p.f. of X is

f1 (x) = f (x, y).
y

The notation y implies that the summation is over all values of y such that f (x, y) is
positive.
Recall the urn example in which an urn contains 4 green, 6 red and 10 black balls. Two
balls are drawn randomly and without replacement.
Let X count the number of green balls and Y count the number of reds. The (joint) probabil-
ity distribution for (X, Y ) is determined by counting the number of ways to draw 0 x 2
from 4 (reds) and 0 y 2 from 6 (greens) and the number of ways to draw 2 x y from
10, given that x + y 2. The joint probability function is
(4)(6)( 10 )

x y( 2xy
20
) , x, y {0, 1, 2} and x + y 2,
f (x, y) = (1)

2
0 otherwise.
The p.f. may also be presented by enumerating all possible values of (X, Y ) and their
probabilities. Table 1 illustrates.
Table 1: The joint probability function for the number of red (X) and green (Y ) balls from
the urn example.
y
x 0 1 2
0 .237 .316 .079
1 .210 .126 0
2 .032 0 0
The marginal distributions of X and Y can be obtained by summing along rows (yielding
the marginal p.f. of X) and along columns (yielding the marginal p.f. of Y ). The result is
displayed in Table 2.
Table 2: Joint and marginal probability functions for the number of red (X) and green (Y )
balls from the urn example.
y
x 0 1 2 f1 (x)
0 .237 .316 .079 .632
1 .210 .126 0 .336
2 .032 0 0 .032
f2 (y) .479 .442 .079 1
Theorem 3.5.2. If X and Y have a continuous joint distribution with p.d.f. f , then the
marginal p.f. f1 of X is

f1 (x) = f (x, y) dy for < x < .

The proof is based on recognizing that a probability involving X alone, say Pr(X x), is
equivalent to Pr(X x, Y < ). The condition that Y may take on any value y R leads
to the integral
x x
Pr(X x, Y < ) = f (x, y) dydx = f1 (x) dx.

Example 3.5.3 Suppose that X and Y have the following joint p.d.f.:

21 x2 y, for x2 y 1,
f (x) = 4
0, otherwise.
To find the marginals, the support of X and Y must be determined from the condition
x2 y 1. Obviously, the support is limited to y 1. Furthermore, 0 x2 0 y.

Finally, x2 y implies that y x y. The support is shown in red:
1.0
0.8
0.6
y
0.4
0.2
0.0
1.0 0.5 0.0 0.5 1.0
x
Then
1
21
f1 (x) = x2 y dy
4 x2
1
21 x2 y 2
=
4 2 x2

21 (x2 x4 ), for 1 x 1,
8
=
0, otherwise.
The marginal p.d.f. of Y is

y
21
f2 (y) =
x2 y dx
4 y
y
21 3
= x y
12
y

7 y 5/2 , for 0 < y < 1,
2
=
0, otherwise.

Notice that the bounds of integration are determined by the inequality x2 y y

x y.
Theorem 3.5.3. Suppose that X is discrete and Y is continuous, and the joint p.f./p.d.f.
is f . Then, the marginal p.f. of X is

f1 (x) = f (x, y) dy, for all x.

The marginal p.d.f. of Y is

f2 (y) = f (x, y), for y R.
x
Example 3.5.4 Suppose that X is discrete and Y is continuous, and the joint p.f./p.d.f. is
x1
xy , for x {1, 2, 3}, 0 < y < 1,
f (x, y) = 3

0, otherwise.
Then, the marginal p.f. of X is

1
xy x1
f1 (x) = dy
0 3
1
1 x
= y , for x {1, 2, 3}
3 0

1 , for x {1, 2, 3},
3
=
0, otherwise.
Its useful to establish some notation not used by DeGroot and Schervish.
Let IA denote an indicator function of the set A. The function is defined according to

1, if x A,
IA (x) =
0, if x A.
Now, f1 can be defined without separate cases:

1
f1 (x) = I{1,2,3} (x).
3
Returning to the example, the marginal p.d.f. of Y is

3
xy x1
f2 (y) =
x=1
3
1( )
= 1 + 2y + 3y 2 I(0,1) (y).
3
While it is theoretically possible to compute marginal distributions from the joint distribu-
tion, the reverse is sometimes false. The joint p.f./p.d.f.s of random variables X and Y
cannot be derived from the marginal distributions of X and Y unless X and Y are indepen-
dent random variables.
Definition 3.5.2. Random variables X and Y are independent if for every two sets of real
numbers A and B,
Pr(X A, Y B) = Pr(X A) Pr(Y B).
If X and Y are independent, then
Pr(X x, Y y) = Pr(X x) Pr(Y y)

F (x, y) = F1 (x)F2 (y).
Thus, when X and Y are independent, the joint cumulative distribution function of X and
Y can be constructed as the product of the univariate cumulative distribution functions.
Theorem 3.5.4. X and Y are independent if and only if F (x, y) = F1 (x)F2 (y) for all
real numbers x and y.
The next theorem is very useful for proving that two (or more) random variables are in-
dependent.
Theorem 3.5.5. Suppose that X and Y have a joint p.f./p.d.f. f . Then, X and Y are
independent if and only if
f (x, y) = h1 (x)h2 (y) x, y R, (2)
where h1 is a nonnegative function of x alone and where h2 is a nonnegative function of y

alone.
Corollary 3.5.1. extends Theorem 3.5.5.
Corollary 3.5.1. X and Y are independent if and only if
f (x, y) = f1 (x)f2 (y) x, y R, (3)
where f1 and f2 are the marginal p.f./p.d.f.s of X and Y .
Simply factoring f does not necessarily yield the marginals f1 and f2 since h1 may dif-
fer from f1 by a multiplicative constant (and similarly, factoring may yield h2 (y) = cf2 (y)
for some c = 1).
A more intuitive definition of independent random variables is presented in the next section
on conditional probability, but looking ahead, it will be stated that X and Y are independent
discrete random variables if knowing that y is the realized value of Y does not change the
probability that X takes on any particular value. Mathematically, X and Y are independent
if and only if Pr(X = x|Y = y) = Pr(X = x) for all x and y. The definition extends
to continuous random variables by replacing the events {X = x} and {Y = y} with events
{X A} and {Y B} where A and B are sets such that Pr(X A) > 0 and Pr(Y B) > 0.
Example Consider discrete random variables X and Y counting the number of heads when
coins A and B are tossed at the same time. Coin B is fair, but A is not and yields
Pr(X = 0) = 1/2, Pr(X = 1) = 1/4 = Pr(X = 2). The marginal p.d.f.s are shown in
the margins of Table 3. The body of Table 3 gives the joint p.d.f. of (X, Y ). Each entry was
obtained by computing Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y).
2 2
Notice that x=0 Pr(X = x, Y = y) = Pr(Y = y) and y=0 Pr(X = x, Y = y) = Pr(X =
x).
Table 3: The joint and marginal p.d.fs of X and Y . Values in the body of the Table are
Pr(X = x, Y = y).
y
x 0 1 2 f1 (x)
0 1/8 1/4 1/8 1/2
1 1/16 1/8 1/16 1/4
2 1/16 1/8 1/16 1/4
f2 (y) 1/4 1/2 1/4
Example The data in Table 4 enumerates the outcomes of the Titanic passengers and crew.
Table 5 contains the proportion of all passengers and crew cross-classified into a particular
cell (e.g., Pr(Survived, First) = 203/2201 = .092).
Table 4: Titanic data.

Class
First Second Third Crew Total
Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
Table 5: Outcome probabilities for the Titanic passengers and crew.

Class
First Second Third Crew Pr(Outcome)
Survived .092 .054 .081 .096 .323
Died .055 .076 .240 .306 .677
Pr(Class) .148 .129 .321 .402
The survivorship random variable with marginal p.d.f. Pr(Survived) = .323 and Pr(Died) =
.677 is not independent of the class random variable since product of the marginal prob-
abilities Pr(First) = .148 and Pr(Survived) = .323 is .148 .322 = .0478 = .092 =
Pr(Survived, First).
Example 3.5.9. Suppose that X and Y are independent random variables and
g(x) = 2xI[0,1] (x)
is the p.d.f. of both random variables. Recall that

1, if x [0, 1],
I[0,1] (x) =
0, if x [0, 1].
The probability Pr(X + Y 1) is computed as follows:
1. f (x, y) = 4xyI[0,1] (x)I[0,1] (y) = 4xyI[0,1][0,1] (x, y)
2. Let S0 = {X + Y 1} = {(x, y)|0 x 1 y}. Then

1 1y
Pr[(X, Y ) S0 ] = 4xy dxdy
0 0
1
= 2(1 y)2 y dy
0
1
= .
6
Example 3.5.10. Consider the joint p.d.f. of (X, Y ):

kx2 y 2 , x2 + y 2 1,
f (x, y) =
0, otherwise .
It might appear that X and Y are independent random variables since kx2 y 2 is easily fac-
tored as two functions, each of which depend only on one variable. However, the support of
(X, Y ) is not rectangular with edges parallel to the x- and y-axes, so there is no possibility
of defining the support of X without reference to Y .
Another view of this complication writes f using an indicator function to explicitly identify
the support:
f (x, y) = kx2 y 2 I{(r,s)|r2 +s2 1} (x, y).
There is no possibility of factoring I{(r,s)|r2 +s2 1} (x, y) as two indicator functions each of
which involves only one variable. Factorization could be accomplished if the support were
rectangular with edges parallel to the x- and y-axes, say
I{(r,s)|r1,s2} (x, y) = I[0,1] (r)I[0,2] (s).
If this were true of f (x, y), then X and Y are independent. DeGroot and Schervish establish
lack of independence of X and Y by making the rectangular support argument. Then they
give Theorem 3.5.6. which states that the support must be rectangular with edges parallel
to the x- and y-axes for independence to hold.
Example 3.5.11. Suppose that X and Y have joint p.d.f.

ke(x+2y) , 0 x, 0 y,
f (x, y) =
0, otherwise .
Theorem 3.5.4. and 3.5.6. can be used to establish that X and Y are independent. Alter-
natively, write
f (x, y) = ke(x+2y) I{(r,s)|0r,0s} (x, y) = h1 (x)h2 (y).
where
h1 (x) = ex I[0,) (x)

h2 (y) = ke2y I[0,) (y).
The function h1 is a p.d.f. (it integrates to 1), but h2 is not, at least until k is replaced by 12 .

Marginal Distributions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Marginal Distributions

Uploaded by

Copyright:

Available Formats

STAT 421 Lecture Notes 52

3.5 Marginal Distributions

1.0 0.5 0.0 0.5 1.0

The marginal p.d.f. of Y is

The marginal p.d.f. of Y is

Then, the marginal p.f. of X is

Now, f1 can be defined without separate cases:

Pr(X x, Y y) = Pr(X x) Pr(Y y)

f (x, y) = h1 (x)h2 (y) x, y R, (2)

where h1 is a nonnegative function of x alone and where h2 is a nonnegative function of y

Corollary 3.5.1. extends Theorem 3.5.5.

Corollary 3.5.1. X and Y are independent if and only if

f (x, y) = f1 (x)f2 (y) x, y R, (3)

where f1 and f2 are the marginal p.f./p.d.f.s of X and Y .

Table 4: Titanic data.

Table 5: Outcome probabilities for the Titanic passengers and crew.

g(x) = 2xI[0,1] (x)

is the p.d.f. of both random variables. Recall that

The probability Pr(X + Y 1) is computed as follows:

1. f (x, y) = 4xyI[0,1] (x)I[0,1] (y) = 4xyI[0,1][0,1] (x, y)

2. Let S0 = {X + Y 1} = {(x, y)|0 x 1 y}. Then

I{(r,s)|r1,s2} (x, y) = I[0,1] (r)I[0,2] (s).

Example 3.5.11. Suppose that X and Y have joint p.d.f.

h1 (x) = ex I[0,) (x)

You might also like