You are on page 1of 13

Introduction to Probability Theory

K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay

October 1, 2017
2

LECTURES 18-19

Real valued transformations of multi random variables: In this sub-


section, we look at transformations of (X, Y ) of the form ϕ ◦ (X, Y ), i.e.
ϕ(X, Y ) where ϕ : R2 → R, i.e. real valued functions of (X, Y ), for example
X +Y, XY etc. One can write down the distribution function Z = ϕ◦(X, Y )
using the distribution µ of (X, Y ) as follows. Set
Az = {(x, y)|ϕ(x, y) ≤ z}, z ∈ R.
Then
FZ (z) = P ({ϕ(X, Y ) ∈ (−∞, z]}) = P ({(X, Y ) ∈ Az }) = µ(Az ),
where µ denote the joint distribution of (X, Y ) and FZ denote the distribu-
tion function of Z = ϕ(X, Y ). Hence it is all about computing Az and then
µ(Az ).

Example 0.1 (Distribution of sum) Let X, Y be with joint pdf f . Then one
can compute the distribution of the sum Z = X + Y as follows. Note
Az = {(x, y)| − ∞ < x < ∞, −∞ < y ≤ z − x}.
Hence
Z ∞ Z z−x
FZ (z) = f (x, y)dydx
Z−∞ −∞
∞ Z z
(put t = y + x) = f (x, t − x)dtdx
Z−∞ −∞
z Z ∞
(change order of integration) = f (x, t − x)dxdt.
−∞ −∞

Hence X + Y has pdf given by


Z ∞
fZ (z) = f (x, z − x)dx.
−∞

Corollary 0.1 Let X, Y be independent random variables with joint pdf f .


Then the pdf of X + Y is given by
fX+Y (z) = fX ∗ fY (z),
where fX ∗ fY denote the convolution of fX and fY and is defined as
Z Z
fX ∗ fY (z) = fX (y)fY (z − y)dy = fX (z − y)fY (y)dy .
R R
3

Proof. The proof (exercise) follows immediately from f (x, y) = fX (x)fY (y)
and the above example.

Example 0.2 Let X, Y be independent exponential random variables with


parameters λ1 and λ2 respectively. Then

λ1 e−λ1 x if x ≥ 0

fX (x) =
0 if x < 0 .

fY is given similarly. Now for z ≤ 0, clearly fX ∗ fY (z) = 0. For z > 0,


Z z
fX ∗ fY (z) = λ1 e−λ1 x λ2 e−λ2 (z−x) dx
0 λ1 λ2 −λ z
1 − e−λ2 z ) if λ1 6= λ2
= λ2 −λ1 (e .
2
λ ze −λz if λ1 = λ2 = λ

The above gives the pdf of X + Y .

Linear transformations of multi random variables: In this section we


look at when ϕ(x, y) = (x, y)A, where A is a 2 × 2 invertible (i.e. non
singular) matrix. We will see the distribution of φ(X, Y ). Note that
 
cos θ − sin θ
A =
sin θ cos θ

rotates (X, Y ) through an angle θ in the counter clockwise direction. To


find the distribution of (U, V ) = (X, Y )A, we use the following change of
variable formula. Let ϕ = (ϕ1 , ϕ2 ) : R2 → R2 be a map (with continuously
differentiable partial derivatives) such that the Jacobian at (x, y)
!
∂ϕ1 ∂ϕ1
∂x ∂y
J(ϕ1 (x, y), ϕ2 (x, y)) = det ∂ϕ2 ∂ϕ2 6= 0
∂x ∂y

Then the area element under the mapping (x, y) 7→ (u = ϕ1 (x, y), v =
ϕ2 (x, y)) makes the transformation dxdy 7→ dudv = |J(x, y)|dxdy 1
1
Here note that infinitesemal (small) rectangle [x, x + dx] × [y, y + dy] i.e. dx × dy
is approximately mapped to the parallelogram du × dv. Now du is the vector joining the
points (ϕ1 (x, y), ϕ2 (x, y)) and (ϕ1 (x + dx, y), ϕ2 (x + dx, y)) and hence

du = (ϕ1 (x + dx, y) − ϕ1 (x, y), ϕ2 (x + dx, y) − ϕ2 (x, y))


∂ϕ1 (x, y) ∂ϕ2 (x, y)
∼ ( dx, dx).
∂x ∂x
4

Theorem 0.1 Let D be an elementary region in R2 and f : D → R be


continuous. Let ϕ = (ϕ1 , ϕ2 ) : O → R2 be such that

(i) ϕ is one to one

(ii) ϕ1 , ϕ2 have continuous paritial derivatives on O, O is an open set in


R2

(iii) J(ϕ1 (x, y), ϕ2 (x, y)) 6= 0 on O

(iv) There exists E ⊆ O such that E is elementary and ϕ(E) = D.

Then
ZZ ZZ
f (u, v)dudv = f (ϕ1 (x, y), ϕ2 (x, y))|J(ϕ1 (x, y), ϕ2 (x, y))|dxdy.
D E

As an application of the above change of variable formula, we have the


following result.

Theorem 0.2 Let (X, Y ) be a random vector in R2 with joint pdf f and A
be a non singular 2 × 2 matrix. Then (U, V ) = (X, Y )A is with pdf g given
by
1
g(u, v) = f ((u, v)A−1 )
|detA|

Proof: For u, v ∈ R, set Ruv = (−∞, u] × (−∞, v] and I = Ruv A−1 .


One can see that I will be an elementary set. Now the distribution function
Similarly
∂ϕ1 (x, y) ∂ϕ2 (x, y)
dv ∼ ( dy, dy).
∂y ∂y

∂ϕ1 (x, y) ∂ϕ2 (x, y)


du = dxi + dxj + 0k
∂x ∂x
∂ϕ1 (x, y) ∂ϕ2 (x, y)
dv = dyi + dyj + 0k.
∂y ∂y

Area of the parallelogram = |du × dv|


∂ϕ1 (x, y) ∂ϕ2 (x, y) ∂ϕ2 (x, y) ∂ϕ1 (x, y)
 
= dxdy − dxdy

∂x ∂y ∂x ∂y
k

= |J(ϕ1 (x, y), ϕ2 (x, y))|dxdy.


5

F(U,V ) of (U, V ) is given by


F(U,V ) (u, v) = P {(U, V ) ∈ Ruv }
[using (x, y) 7→ (x, y)A, a bijection] = P {(X, Y ) ∈ I}
ZZ
= f (x, y)dxdy
Z ZI
1
(Theorem 0.1 to (u, v) 7→ (u, v)A−1 ) = f ((u, v)A−1 ) dudv.
Ruv |detA|
Hence (U, V ) has a pdf say g and is given by
1
g(u, v) = f ((u, v)A−1 ).
|detA|

Multinormal distribution. X is said tobe a non degenerate multinormal


σ12 σ12
with parameters µ = (µ1 , µ2 ) and Σ = if the joint pdf fX is
σ21 σ22
given by
1 1 −1 ⊥
fX (x1 , x2 ) = √ e− 2 (x−µ)Σ (x−µ) ,
2π detΣ
where Σ is symmetric positive definite and (x − µ)⊥ is the column vector
corresponding to the row vector (x1 −µ1 , x2 −µ2 ). Also Σ is positive definite
implies that |σ12 | ≤ σ1 σ2 . Rewrite Σ as
 2 
σ1 ρσ1 σ2
Σ =
ρσ1 σ2 σ22
then (exercise)
h i
1 (x1 −µ1 )2 2ρ(x1 −µ1 )(x2 −µ2 ) (x −µ )2
1 − 2 2 − σ1 σ2
+ 2 22
fX (x1 , x2 ) = p e 2(1−ρ ) σ1 σ2
.
2πσ1 σ2 1 − ρ 2

Now we will see the marginal pdfs of multinormal.

Essence of the calculation is ’completing the square’. I will do it sepa-


rately. Consider
(x1 −µ1 )2 2

σ12
− 2ρ(x1 −µ 1 )(x2 −µ2 )
σ1 σ2 + (x2 −µ
σ22
2)

h 2  2 i  2
= (x2σ−µ 2
2)
− 2ρ(x1 −µ1 )(x2 −µ2 )
σ σ + ρ(x1 −µ1 )
σ1 + (1 − ρ2 ) (x1σ−µ 1)

 21 2  2 1
(x2 −µ2 ) ρ(x1 −µ1 ) 2 (x1 −µ1 )
= σ2 − σ + (1 − ρ ) σ
 2 1  2 1
= (x2σ−µ 2
2)
− a + (1 − ρ2 ) (x1σ−µ 1
1)
,
6

where
ρ(x1 − µ1 )
a= .
σ1
Now
 2
(x1 −µ1 )
− 12
 2
Z ∞ σ1 Z ∞ 1 (x2 −µ2 )
e −
2(1−ρ2 ) σ2
−a
fX (x1 , x2 )dx2 = p e dx2
−∞ 2πσ1 σ2 1 − ρ2 −∞
 2
(x1 −µ1 )
x − µ − 21 σ1 Z ∞
h 1 2 2
i e 1 x2
put x = p −a = √ √ e− 2 dx
1 − ρ2 σ2 2πσ1 2π −∞
 2
(x1 −µ1 )
1 − 12 σ1
= √ e .
2πσ1
Hence X1 ∼ N (µ1 , σ1 ). Similarly X2 ∼ N (µ2 , σ2 ).
Theorem 0.3 Let X = (X1 , X2 ) be a multinormal non degenerete random
variable with parameters µ and Σ. Then for any α, β ∈ R αX1 + βX2 is a
normal random variable.
Proof: Since X − µ is normal with paramenters 0 and Σ, it is enough to
prove the theorem when µ = 0. (exercise)

Let A be a 2 × 2 symmetric matrix such that AA⊥ = Σ [ Here the


1 1 1
choice of A is Σ 2 and Σ 2 = P Λ 2 P −1 where Λ is the diagonal matrix (of
1
eigen values of Σ and Σ = P ΛP −1 ) and hence Λ 2 is the diagonal matrix
with diagonal entries as the square root of the eigen values of Λ]. Define
Y = XA−1 , then using Theorem 0.1, the pdf g of Y exists and is given by
g(y1 , y2 ) = |detA|f (yA)
√ 1 1 −1 ⊥
= detΣ √ e− 2 yAΣ Ay
2π detΣ
1 − 1 kyk2
= e 2 .

Hence Y is multi normal with parameters 0 and I. Therefore g(y1 , y2 ) =
gY1 (y1 )gY2 (y2 ). This implies that Y1 and Y2 are independent normal random
variables.
Now we can see that αX1 + βX2 = aY1 + bY2 for some a, b ∈ R and hence
is a normal random variable. (exercise)
This completes the proof.
7

Remark 0.1 The proof of Theorem 0.3 tells us some thing more. Let X ∼
N (µ, Σ), i.e. X is a multinormal random variable with paramenters µ and
Σ. Set X̄ = X − µ, then X̄ ∼ N (0, Σ).
1
Then Ȳ = X̄Σ− 2 ∼ N (0, I). Hence
1 1 1 1
Y := XΣ− 2 = µΣ− 2 + X̄Σ− 2 ∼ N (µΣ− 2 , I).

Note the above is a generalization to the multidimentional case of the


following result for the normal random variables.

X ∼ N (µ, σ 2 ), then aX ∼ N (aµ, a2 σ 2 ).

Theorem 0.3 leads to a more general definition multinormal which in-


cludes multinormal random variables which are degenerate also.

Definition 6.7 A random vector X = (X1 , X2 ) is said to be multinormal if


αX1 + βX2 is normal for all α, β ∈ R.

Example 0.3 Let X1 ∼ N (0, 1) and X = (X1 , −X1 ). Then any linear
combination of the components of X is normally distributed. Also X doesnot
have a joint density function. To see this, let L = {(x, y)|x + y = 0}, the
graph of x + y = 0. Then

P {X ∈ L} = P (Ω) = 1.

Now suppose X has a joint pdf f , then for Ln = L ∩ [−n, n],


ZZ
P {X ∈ Ln } = f (x, y)dxdy = 0, for all n ≥ 1.
Ln

Now
P {X ∈ L} = lim P {X ∈ Ln } = 0
n→∞

a conradiction to P {X ∈ L} = 1. Hence X doesn’t have a density. i.e., X


is an example of a degenerate multinormal distribution.

Chapter 7: Expectation and other moments


In this chapter, we introduce expected value or the mean of a random
variable. First we define expectation for discrete random variables and then
8

for general random variable. Finally we introduce other moments and com-
ment on moment problem.

First we give a useful representation of discrete random variables.


Theorem 0.4 Let X be a discrete random variable defined on a probability
space (Ω, F, P ). Then there exists a partition {An | n = 1, , . . . , N } ⊆ F of
Ω and {an |n = 1, 2, . . . , N } ⊆ R such that an ’s are distinct and
N
X
X = an IAn a.s ,
n=1

where N may be ∞.
Proof. Let F be the distribution function of X. Let {a1 , a2 , . . . , aN } be the
set of all discontinuities of F . Here N may be ∞. Since X is discrete, we
have
XN
F (an ) − F (an −) = 1 .
n=1
Set
An = {X = an } .
Then {An } is pairwise disjoint and ∪N
∞ An = Ω. i.e., {An } is a partition of
Ω and
N
X
X = an IAn .
n=1

Remark 0.2 If X is a discrete random variable on a probability space


(Ω, F, P ), then the ’effective’ range of X is at the most countable. Here
’effective’ range means those values taken by X which has positive probabil-
ity. This leads to the name ’discrete’ random variable.
In fact, in the above proof {An } is a partition of Ω0 which is subset of Ω
by excluding the ’non probable values of X and hence P (Ω0 ) = 1.

Remark 0.3 If X is a discrete random variable, then one can assume with-
out the loss of generality that

X
X = an IAn .
n=1

Since if N < ∞, then set An = ∅ for n ≥ N + 1 and an , n ≥ N + 1 are


chosen so that they are distinct.
9

Theorem 0.5 Let {Bn } be a countable partition of Ω from F and {bb } be


a sequence of real numbers which are not necessarily distinct. Then if

X ∞
X
an IAn = bn IBn
n=1 n=1

then

X ∞
X
an P (An ) = bn P (Bn ) .
n=1 n=1

Proof. (Reading exercise) Note that an ’s are distinct and hence it follows
that each Bm is contained in An for aome n. For each n ≥ 1, set

In = {m ≥ 1|An Bm 6= ∅} .

Then clearly
An = ∪m∈In Bm , n ≥ 1 .
Also if m0 ∈ In then an = bm0 . Therefore

X ∞ X
X
bm P (Bm ) = bm P (Bm )
m=1 n=1 m∈In
X∞ X
= an P (Bm )
n=1 m∈In
X∞
= an P (An ) .
n=1

This completes the proof.


Definition 7.1. Let X be a discrete random variable represented by {(An , an ) | n ≥
1}. Then expectation of X denoted by EX is defined as

X
EX = an P (An ) ,
n=1

provided the right hand side series converges absolutely.

Remark 0.4 In view of Remark 6.0.5., if X has range a1 , a2 , . . . , aN , then


N
X
EX = an P {X = an } .
n=1
10

Example 0.4 Let X be a Bernoulli(p) random variable. Then

X = IA , where A = {X = 1} .

Hence
EX = P (A) = p .

Example 0.5 Let X be a Binomial(n, p) random variable. Then


n
X
X = k IAk , where Ak = {X = k} .
k=0

Hence
n n  
X X n k
EX = kP (Ak ) = k p (1 − p)n−k
k
k=0   k=0
n
X n−1 k
= n p (1 − p)n−k = np .
k−1
k=1

Here we used the identity


   
n n−1
k = n .
k k−1

Example 0.6 Let X be a Poisson (λ) random variable. Then



X
X = n IAn , where An = {X = n} .
n=0

Hence

X λn
EX = n e−λ = λ.
n!
n=0

Example 0.7 Let X be a Geometric (p) random variable. Then



X
X = n IAn , where An = {X = n}
n=1

Hence

X p 1
EX = n p (1 − p)n−1 = 2
= .
(1 − (1 − p)) p
n=1
11

Theorem 0.6 (Properties of expectation) (Reading exercise) Let X and Y


be discrete random variables with finite means. Then
(i) If X ≥ 0, then EX ≥ 0.
(ii) For a ∈ R
E(aX + Y ) = aEX + EY .

Proof. (i) Let {(An , an )|n ≥ 1} be a representation of X. Then X ≥ 0


implies an ≥ 0 for all n ≥ 1. Hence

X
EX = an P (An ) ≥ 0 .
n=1

(ii) Let Y has a representation (Bn , bn ) | n ≥ 1}. Now by setting

Cnm = An Bm , n, m ≥ 1, anm = an , m ≥ 1, bnm = bm , n ≥ 1

one can use same partition for X and Y . Therefore



X
aX + Y = (aanm + bnm ) ICnm .
n,m=1

Hence

X
E(aX + Y ) = (aanm + bnm ) P (Cnm )
n,m=1
X∞ X∞ ∞ X
X ∞
= a anm P (Cnm ) + bnm P (Cnm )
n=1 m=1 m=1 n=1
X∞ X ∞ X∞ X ∞
= a an P (An Bm ) + bm P (An Bm )
n=1 m=1 m=1 n=1
X∞ ∞
X
= a an P (An ) + bm P (Bm )
n=1 m=1
= a EX + EY .

Definition 7.2. (Simple random variable) A random variable is said to be


simple if it is discrete and the distribution function has only finitely many
discontinuities.
Theorem 0.7 Let X be random variable in (Ω, F, P ) such that X ≥ 0,
then there exists a sequence of simple random variables {Xn }satisfying
(i) For each n ≥ 1, Xn ≥ 0, Xn ≤ Xn+1 ≤ X.
(ii) For each ω ∈ Ω, Xn (ω) → X(ω) as n → ∞.
12

Proof. For n ≥ 1, define simple random variable Xn as follows:

k k k+1
(
if ≤ X(ω) < , k = 0, · · · , n2n − 1
Xn (ω) = 2n 2n 2n
0 if X(ω) ≥ n .

Then Xn ’s satisfies the following:

Xn ≤ Xn+1 , n ≥ 1

lim Xn (ω) = X(ω), ω ∈ Ω .


n→∞

Lemma 0.1 Let X be a non negative random variable and {Xn } be a se-
quence of simple random variables satisfying (i) and (ii) of Theorem 6.0.25.
Then limn→∞ EXn exists and is given by

lim EXn = sup{EY | Y is simple and Y ≤ X} .


n→∞

Proof. (Reading exercise) Since Xn ≤ Xn+1 , we have EXn ≤ EXn+1 , n ≥


1 (see exercise). Hence limn→∞ EXn exists. Also since Xn ’s are simple,
clearly,
EXn ≤ sup{EY | Y is simple and Y ≤ X}, n ≥ 1.
Therefore

lim EXn ≤ sup{EY | Y is simple and Y ≤ X} .


n→∞

Hence to complete the proof, it suffices to show that for Y simple and
Y ≤ X,
EY ≤ lim EXn .
n→∞

Let
m
X
Y = ak IAk ,
k=1

where {Ak | k = 1, . . . , m} is a partition of Ω. Fix  > 0, set for k ≥ 1 and


n ≥ 1,
Akn = {ω ∈ Ak | Xn (ω) ≥ ak − } .
Since Xn ≤ Xn+1 , n ≥ 1, we have for each k ≥ 1.

Ak n ⊆ Ak n+1 , n ≥ 1 .
13

Also
ω ∈ Ak =⇒ X(ω) = ak
=⇒ limn→∞ Xn (ω) = ak
=⇒ Xn0 (ω) ≥ ak −  for some n0
=⇒ ω ∈ Akn0 ⊆ ∪∞n=1 Akn .

Hence
∪∞
n=1 Akn = Ak , 1 ≤ k ≤ m .

From the definition of Akn we have


m
X
Xn ≥ (ak − )IAkn .
k=1

Hence
m
X
EXn ≥ (ak − )P (Akn ) . (0.1)
k=1

Using continuity property of probability, we have

lim P (Akn ) = P (Ak ), 1 ≤ k ≤ m .


n→∞

Now let, n → ∞ in (0.1), we get


m
X
lim EXn ≥ (ak − )P (Ak ) = EY −  .
n→∞
k=1

Since,  > 0 is arbitrary, we get

lim EXn ≥ EY .
n→∞

This completes the proof.


Definition 7.3. The expectation of a non negative random variable X is
defined as
EX = lim EXn , (0.2)
n→∞

where Xn is a sequence of simple random variables as in Theorem 7.4.

Remark 0.5 One can define expectation of X, non negative random vari-
able, as
EX = sup{EY | Y is simple and Y ≤ X} .
But we use Definition 7.3., since it is more handy.

You might also like