You are on page 1of 9

7 Chebyshevs inequality and the law of large num-

bers
In this chapter we will look at some results from probability which
can be useful in statistics.
We rst prove a result, called Markovs inequality, which we will
use to prove Chebyshevs inequality. But Markovs inequality can
also be used to nd some useful results, as we shall see. Note that
both Markov and Chebyshev are Russian names and the originals are
written in the cyrillic alphabet. There are different ways of writing
these letters in our alphabet. So some books may have Markoff or
Tchebysheff or other versions.
Theorem 7.1. Markovs inequality
Let X be a random variable and g a non-negative function with
domain the real line; then
P[g(X) k]
E[g(X)]
k
for every k 0.
Proof. Assume that X is continuous with pdf f(x), then
E[g(X)] =
_

g(x)f(x) dx
=
_
{x:g(x)k}
g(x)f(x) dx +
_
{x:g(x)<k}
g(x)f(x) dx

_
{x:g(x)k}
g(x)f(x) dx since g(x), f(x) 0

_
{x:g(x)k}
kf(x) dx
= kP[g(x) k]
and the result follows.
70
Example 7.1. X is a random variable taking only non-negative val-
ues. The mean E[X] = . Use the Markov inequality to nd an
upper bound for P(X 3). Find the exact probability if X has an
exponential distribution with mean
1
.
We may use Markovs inequality with g(x) = x and k = 3 to get
P(X 3)
E[X]
3
=

3
=
1
3
.
If X has an exponential distribution with mean
1
then the required
probability is
_

3/
e
x
dx = [e
x
]

3/
= 0 + e
3
= 0.0498
This value is a lot less than the upper bound but that applies to all
possible distributions.
Example 7.2. Suppose X has a binomial distribution with n = 90
and p = 1/3. Use Markovs inequality to nd an upper bound for
P(X 50).
We may use Markovs inequality with g(x) = x and k = 50 to get
P(X 50)
E[X]
50
=
30
50
= 0.6.
Theorem 7.2. Chebyshevs inequality
If X is a random variable with mean and nite variance
2
then
P[|X | r] = P[(X

)
2
r
2

2
]
1
r
2
for every r > 0.
Proof. Take g(x) = (x )
2
and k = r
2

2
in Markovs inequality.
Then
P[(X

)
2
r
2

2
]
E[(X )
2
]
r
2

2
but E[(X )
2
] =
2
and the result follows.
71
Note that we can rewrite the inequality as
P[|X | r] 1
1
r
2
.
If we let = r so that r = / we have
P[|X | ] =

2

2
and
P[|X | ] 1

2
.
We see that
P[ r X + r] 1
1
r
2
so that for example if r = 2
P[ 2 X + 2] 1
1
4
=
3
4
.
Of course particular distributions may greatly exceed this lower bound.
For example if X is normally distributed then
P[ 2 X + 2] > 0.95.
There are examples of randomvariables which attain the lower bound
(See Assignment 9 for one such) so we cannot improve on Cheby-
shevs inequality without imposing extra conditions.
Example 7.3. Y is a random variable with mean 11 and variance 9.
Use Chebyshevs inequality to nd
(a) a lower bound for P(6 Y 16)
(b) the value of c such that P(|Y 11| c) 0.09.
72
(a)
P(6 < Y < 16) = p(11 5 < Y < 11 + 5)
= P(|Y 11| < 5)
1
9
25
=
16
25
= 0.64.
(b)
P(|Y 11| c)

2
c
2
Therefore
2
/c
2
= 0.09 so c
2
= 9/0.09 and c = 10.
Example 7.4. The US mint produces dimes with an average diameter
of 0.5ins and a standard deviation of 0.01ins. Using Chebyshevs
inequality nd a lower bound for the number of coins in a batch of
400 having diameter between 0.48 and 0.52.
We use
P[|X | r] 1
1
r
2
with = 0.5, = 0.01 and r = 2. So
P[|X 0.5| 2 0.01] 1
1
4
=
3
4
therefore at least 300 of the 400 coins will lie between 0.48 and
0.52ins.
Example 7.5. For a certain soil the number of wireworms per cubic
foot has mean 100. Assuming a Poisson distribution for the number
of wireworms, give an interval that will include at least 5/9 of the
sample values on wireworm counts obtained from a large number of
1 cubic foot samples.
We have = 100,
2
= 100 (since it is Poisson) and so = 10.
Using Chebyshevs inequality we have
P[|X 100| 10r] 1
1
r
2
73
so that
P(100 10r X 100 + 10r) 1
1
r
2
We want 1
1
r
2
= 5/9 which implies that r = 3/2 so the required
interval is (85, 115).
Consider a random variable Y which is the number of successes in
n Bernoulli trials with probability of success p on each trial. Thus
Y has a binomial distribution with parameters n and p. Y/n is the
proportion of successes. If p is unknown then Y/n is an estimate of
p. How close is Y/n to p as n increases?
We know that
Var
_
Y
n
_
=
1
n
2
Var(Y )
=
1
n
2
np(1 p)
=
p(1 p)
n
Let > 0. Then Chebyshevs inequality gives us
P
_

Y
n
p

<
_
1
p(1 p)
n
2
So for a xed and 0 < p < 1 if we take the limit as n tends to
innity we have
lim
n
P
_

Y
n
p

<
_
lim
n
_
1
p(1 p)
n
2
_
= 1
but a limit of probabilities cannot be larger than 1 so
lim
n
P
_

Y
n
p

<
_
= 1.
So if n is large enough Y/n will be within of p for any > 0.
74
More generally, let X be a random variable with density f
X
(x),
mean and nite variance
2
. Let

X
n
be the sample mean of a
random sample of size n from this distribution. So

X
n
has mean
and variance
2
/n. Let and be any two specied numbers satis-
fying > 0 and 0 < < 1. If n is any integer greater than
2
/
2

then
P(|

X
n
| ) = P(

X
n
) 1
Proof. We apply Markovs inequality with g(X) = (

X
n
)
2
and
k =
2
. Then
P(|

X
n
| ) = P((

X
n
)
2

2
)
1
E[(

X
n
)
2
]

2
= 1

2
/n

2
1
so long as >

2
n
2
or equivalently n >

2

2
.
This is called a law of large numbers
Example 7.6. X has an unknown mean and variance equal to 1. How
large a random sample must be taken in order that the probability
will be at least 0.95 that the sample mean will lie within 0.5 of the
population mean?
We have
2
= 1, = 0.5, 1 = 0.95 so that = 0.05. So we
need
n >

2

2
=
1
0.05(0.5)
2
= 80.
Example 7.7. How large a sample must be taken in order that you
are 99% certain that

X
n
is within 0.5 of . We have = 0.5,
1 = 0.99 so that = 0.01. So we need
n >

2

2
=

2
0.01(0.5)
2
= 400.
75
8 Limiting moment generating functions and the Cen-
tral Limit Theorem
We start by showing that a binomial can be approximated by a Pois-
son when n is sufciently large and p is fairly small. You should
have seen this before but we are going to show it in a new way by
taking the limit of an mgf. We make use of the following theorem
which we will not prove:
Theorem 8.1. If a sequence of moment generating functions ap-
proaches a certain mgf, say M(t), then the limit of the correspond-
ing distributions must be the distribution with mgf M(t).
Let Y be binomial with parameters n and p. We take the limit as
n such that np = so that p 0.
The moment generating function of Y is
M
Y
(t) = (1 p + pe
t
)
n
. Letting p = /n we have
M
Y
(t) =
_
1

n
+

n
e
t
_
n
=
_
1 +
(e
t
1)
n
_
n
A well known result is that lim
n
(1 + b/n)
n
= e
b
. Using this
result we see that
lim
n
M
Y
(t) = exp{(e

1)}
but this is the mgf of a Poisson distribution with mean and hence
applying the Theorem we see that the limit of the binomial distribu-
tions is a Poisson distribution.
76
We can now see a proof of the Central Limit Theorem. This result
has been stated and used before to justify assuming a sample mean is
approximately normal and other approximations. This proof applies
to distributions with a nite positive variance and a moment gener-
ating function. Other versions of the theorem make less restrictive
assumptions but we cannot prove those.
Theorem 8.2. Central Limit Theorem
If

X
n
is the mean of a random sample X
1
, X
2
, . . . , X
n
from a distri-
bution with a nite mean and a nite positive variance
2
and the
moment generating function of X exists, then the distribution of
W
n
=

X
n

n
=

X
i
n

n
is N(0, 1) in the limit as n .
Proof. We let Y = (X )/ with moment generating function
M
Y
(t).
We nd the moment generating function of W
n
.
E[exp(tW
n
)] = E
_
exp
__
t

n
_
(

X
i
n)
__
= E
_
exp
_
t

n
_
X
1

__
exp
_
t

n
_
X
n

___
= E
_
exp
_
t

n
_
X
1

___
E
_
exp
_
t

n
_
X
n

___
=
_
M
Y
_
t

n
__
n
Now E[Y ] = 0 and Var[Y ] = E[Y
2
] = 1 so M

Y
(0) = 0 and
M

Y
(0) = 1.
We expand M
Y
(t) in a Taylor expansion as
M
Y
(t) = 1 + M

Y
(0)t + M

Y
(0)t
2
/2! + M

Y
(0)t
3
/3! +
= 1 + t
2
/2 + M

Y
(0)t
3
/3! +
77
therefore M
Y
(t/

n) is given by
M
Y
(t/

n) = 1 + t
2
/2n + M

Y
(0)t
3
/3!n
3/2
+
= 1 +
1
n
_
t
2
2
+
M

Y
(0)t
3

n3!
+
_
We see that
M
Y
(t/

n)
n
=
_
1 +
1
n
_
t
2
2
+
M

Y
(0)t
3

n3!
+
__
n
Thus if n is large we can truncate the expansion and so
lim
n
M
Y
(t/

n)
n
= lim
n
_
1 +
_
t
2
2n
__
n
= exp(t
2
/2)
but this is the mgf of a standard normal distribution and so we see
that W
n
has a limiting N(0, 1) distribution.
78

You might also like