Professional Documents
Culture Documents
Lecture Notes 3
Renato Feres
1.1
A random variable X is said to be discrete if, with probability one, it can take
only a finite or countably infinite number of possible values. That is, there is a
set {x1 , x2 , . . . } R such that
P (X = xk ) = 1.
k=1
Rb
a
Proposition 1.1 (PDF of a linear transformation) Let X be a continuous random variable with PDF fX (x) and cumulative distribution function FX (x)
and let Y = aX + b. Then the PDF of Y is given by
1
fY (y) = fX ((y b)/a).
a
Proof. First assume that a > 0. Since Y y if and only if X (y b)/a, we
have
FY (y) = P (Y y) = P (X (y b)/a) = FX ((y b)/a).
Differentiating both sides, we find
fY (y) =
1
fX ((y b)/a).
a
1.2
Expectation
The most basic parameter associated to a random variable is its expected value
or mean. Fix a probability space (S, F, P ) and let X : S R be a random
variable.
Definition 1.1 (Expectation) The expectation or mean value of the random
variable X is defined as
P
i=1 xi P (X = xi ) if X is discrete
E[X] =
R
xfX (x)dx
if X is continuous.
Example 1.1 (A game of dice) A game consists in tossing a die and receiving a payoff X equal to $n for n pips. It is natural to define the fair price to
play one round of the game as being the expected value of X. If you could play
the game for less than E[X], you would make a sure profit by playing it long
enough, and if you pay more you are sure to lose money in the long run. The
fair price is then
6
X
E[X] =
i/6 = 21/6 = $3.50
i=1
Example 1.2 (Waiting in line) Let us suppose that the waiting time to be
served at the post office at a particular location and time of day is known to
follow an exponential distribution with parameter = 6 (in units 1/hour).
What is the expected time of wait? We have now a continuous random variable
T with probability density function fT (t) = et . The expected value is easily
calculated to be:
Z
1
E[T ] =
tet dt = .
0
Therefore, the mean time of wait is one-sixth of an hour, or 10 minutes.
It is a bit inconvenient to have to distinguish the continuous and discrete
cases every time we refer to the expected value of a random variable. For this
reason, we need a uniform notation that represents all cases. We will use the
notation for the Lebesgue integral, introduced in the appendix of the previous
set of notes. (You do not need to know about the Lebesgue integral. We are
only using the notation.) So we will often denote the expected value of a random
variable X, of any type, by
Z
E[X] =
X(s)dP (s).
S
For discrete random variables, the same integral represents the sum in definition
1.1.
Here are a few simple properties of expectations.
Proposition 1.3 Let X and Y be random variables on the probability space
(S, F, P ). Then:
1. If X 0 then E[X] 0.
2. For any real number a, E[aX] = aE[X].
1.3
Variance
p
Var(X).
The variance is
Z
Var(D) =
0
Some of the general properties of variance are enumerated in the next proposition. They can be derived from the definitions by simple calculations. The
details are left as exercises.
Proposition 1.4 Let X, Y be random variables on a probability space (S, F, P ).
1. Var(X) = E[X 2 ] E[X]2 .
2. Var(aX) = a2 Var(X), where a is any real constant.
3. If X and Y have finite variance and are independent, then
Var(X + Y ) = Var(X) + Var(Y ).
The variance of a sum of any number of independent random variables now
follows from the above results. The next proposition implies that the standard
deviation of the
arithmetic mean of independent random variables X1 , . . . , Xn
decreases like 1/ n.
Proposition 1.5 Let X1 , X2 , . . . , Xn be independent random variables having
the same standard deviation . Denote their sum by
Sn = X 1 + + X n .
Then
Var
Sn
n
=
2
.
n
Mean and variance are examples of moments and central moments of probability distributions. These are defined as follows.
Definition 1.3 (Moments) The moment of order k = 1, 2, 3, . . . of a random
variable X is defined as
Z
k
E[X ] =
X(s)k dP (s).
S
The meaning of the central moments, and the variance in particular, is easier
to interpret using Chebyshevs inequality. Broadly speaking, this inequality says
that if the central moments are small, then the random variable cannot deviate
much from its mean.
Theorem 1.1 (Chebyshev inequality) Let (S, F, P ) be a probability space,
X : S R a random variable, and > 0 a fixed number.
X(s)k dP (s)
{sS:X(s)}
Z
k dP (s)
{sS:X(s)}
= k P (X ).
So we get P (X ) E[X k ]/k as claimed.
Example 1.4 Chebyshevs inequality, in the form of inequality 3 in the theorem, implies that if X is a random variable with finite mean m and finite
variance 2 , then the probability that X lies in the interval (m 3, m + 3) is
at least 1 1/32 = 8/9.
Example 1.5 (Tosses of a fair coin) We make N = 1000 tosses of a fair coin
and denote by SN the number of heads. Notice that SN = X1 + X2 + + XN ,
where Xi is 1 if the i-th toss obtains head, and 0 if tail. We assume that the Xi
are independent and P (Xi = 0) = P (Xi = 1) = 1/2. Then E[SN ] = N/2 = 500
and Var(SN ) = N/4 = 250. From the second inequality in theorem 1.1 we have
P (450 SN 550) is at least 1 250/502 = 0.9.
A better estimate of the dispersion around the mean will be provided by the
central limit theorem, discussed later.
2.1
For example, let X1 , X2 , . . . be a sequence of independent, identically distributed random variables with two outcomes: 1 with probability p and 0 with
probability 1 p. Then the weak law of large numbers says that the arithmetic
mean Sn /n converges to p = E[Xi ] in the sense that, for any > 0, the probability that Sn /n lies outside the interval [p , p + ] goes to zero as n goes to
.
2.2
The weak law of large numbers, applied to a sequence Xi {0, 1} of coin tosses,
says that Sn /n must lie in an arbitrarily small interval around 1/2 with high
probability (arbitrarily close to 1) if n is taken big enough. A stronger statement
would be to say that, with probability one, a sequence of coin tosses yields a
sum Sn such that Sn /n actually converges to 1/2.
To explain the meaning of the stronger claim, let us be more explicit and
view the random variables as functions Xi : S R on the same probability
space (S, F, P ). Then, for each s S we can consider the sample sequence
X1 (s), X2 (s), . . . , as well as the arithmetic averages Sn (s)/n, and ask whether
Sn (s)/n (an ordinary sequence of numbers) actually converges to 1/2. The
strong law of large numbers states that the set of s for which this holds is an
event of probability 1. This is a much more subtle result than the weak law,
and we will be content with simply stating the general theorem.
Theorem 2.2 (Strong law of large numbers) Let (S, F, P ) be a probability
space and let X1 , X2 , . . . be random variables defined on S with finite means and
variances satisfying
X
Var(Xi )
< .
i2
i=1
Then, there is an event E F of probability 1 such that for all s E,
Sn
E[Sn ]
0
n
n
as n . In particular, if in addition all the means are equal to m then for
all s in a subset of S of probability 1,
lim
Sn (s)
= m.
n
The reason why the normal distribution arises so often is the central limit theorem. We state this theorem here without proof, although experimental evidence
for its validity will be given in a number of examples.
Let (S, F, P ) be a probability space and X1 , X2 , . . . be independent random
variables defined on S. Assume that the Xi have a common distribution with
finite expectation m and finite nonzero variance 2 . Define the sum
Sn = X 1 + X 2 + + X n .
Theorem 3.1 (The central limit theorem) If X1 , X2 , . . . are independent
random variables with mean and variance 2 , then the random variable
Zn =
Sn n
n
8
f f
f
2
2.5
2
1.5
1.5
1
1
0.5
0
1
0.5
0.5
0.5
0
2
f f f
f f f f
6
5
3
4
2
3
2
1
1
0
4
0
4
Figure 1: Convolution powers of the function f (x) = 1 over [1, 1]. By the
central limit theorem, after centering and re-scaling (not done in the figure),
f n approaches a normal distribution.
Example 3.1 (Die tossing) Consider the experiment of tossing a fair die n
times. Let Xi be the number obtained in the i-th toss and Sn = X1 + + Xn .
The Xi are independent and have a common discrete distribution with mean =
3.5 and 2 = 35/12. Assuming n = 1000, by the central limit theorem Sn has
approximately the p
normal distribution with mean (Sn ) = 3500 and standard
deviation (Sn ) = 35 1000/12, which is approximately 54. Therefore, if we
simulate the experiment of tossing a die 1000 times, repeat the experiment a
number of times (say 500) and plot a histogram of the result, what we obtain
should be approximated by the function
f (x) =
1 x 2
1
e 2 ( ) ,
2
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
3300
3350
3400
3450
3500
3550
3600
3650
3700
Figure 2: Comparison between the sample distribution given by the stem plot
and the normal distribution for the experiment of tossing a die 1000 times and
counting the total number of pips.
Example 3.2 (Die tossing II) We would like to compute the probability that
after tossing a die 1000 times, one obtains more than 150 6s. Here, we consider the random variable Xi , i = 1, . . . , 1000, taking values in {0, 1}, with
P (Xi = 1) = 1/6. (Xi = 1 represents the event of getting a 6 in the i-th toss.)
Writing Sn = X1 + +Xn , we wish to compute the probability
P (S1000 > 150).
p
Each Xi has mean p = 1/6 and standard deviation (1 p)p. By the central
limit theorem, we approximate the probability p
distribution of Sn by a normal
distribution with = 1000p and variance = 1000(1 p)p. This is approximately = 166.67 and = 11.79. Now, the distribution of (Sn )/ is
10
e 2 z dz
2 1.41
= 0.9215.
The integral above was evaluated numerically by a simple Riemann sum over
the interval [1.41, 10] and step-size 0.01. We conclude that the probability of
obtaining at least 150 6s in 1000 tosses is approximately 0.92.
Up until now we have mostly done simulations of random variables with a finite
number of possible values. In this section we explore a few ideas for simulating
continuous random variables.
Suppose we have a continuous random variable X with probability density
function f (x) and we wish to evaluate the expectation E[g(X)], for some function g(x). This requires evaluating the integral
Z
E[g(X)] =
g(x)f (x)dx.
E[g(X)]
1X
g(xi ).
n i=1
This is the basic idea behind Monte-Carlo integration. It may happen that we
cannot simulate realizations of X, but we can simulate realizations y1 , y2 , . . . , yn
of a random variable Y with probability density function h(x) which is related
to X in that h(x) is not 0 unless f (x) is 0. In this case we can write
Z
E[g(X)] =
g(x)f (x)dx
Z
g(x)f (x)
=
h(x)dx
h(x)
= E[g(Y )f (Y )/h(Y )]
n
1 X g(yi )f (yi )
.
n i=1 h(yi )
11
4.1
12
Figure 3: Simulation of 1000 random points on the square [1, 1]2 with the
uniform distribution. To approximate the ratio of the area of the disc over the
area of the square we compute the fraction of points that fall on the disc.
The above example should prompt the question: How do we estimate the
error involved in, say, our calculation of , and how do we determine the number
of random points needed for a given precision? First, consider the probability
space S = [1, 1]2 with probability measure given by
ZZ
1
P (E) =
dxdy,
4
E
and the random variable D : S {0, 1} which is 1 for a point in the disc and 0
for a point in the complement of the disc. The expected value of D is = /4
and the variance is easily calculated to be 2 = (1 ) = (4 )/16. If we
draw n independent points on the square, and call the outcomes D1 , D2 , . . . , Dn ,
then the fraction of points in the disc is given by the random variable
Dn =
D1 + + Dn
.
n
the error K/ n. Equivalently, we ask for the probability P (|Zn | K), where
Zn =
Dn
.
/ n
This probability can now be estimated using the central limit theorem. Recall that the probability distribution density of Zn , for big n, is very nearly a
13
2
P (|Dn | K/ n)
2
e 2 z dz.
4.2
Transformation methods
xa
,
ba
has an exponential distribution with parameter . In fact, an exponential random variable has PDF
f (x) = ex
and its cumulative distribution function is easily obtained by explicit integration:
F (x) = 1 ex .
Therefore,
1
F 1 (u) = log(1 u).
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=exponential(lambda,n)
%Simulates n independent realizations of a
%random variable with the exponential
%distribution with parameter lambda.
y=-log(rand(1,n))/lambda;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15
4.3
Lookup methods
k
X
pi .
i=0
4.4
Scaling
16
4.5
The methods of this and the next subsection are examples of the rejection
sampler method.
Suppose we want to simulate a random variable with PDF f (x) such that
f (x) is zero outside of the interval [a, b] and f (x) L for all x. Choose X
U (a, b) and Y U (0, L) independently. If Y < f (X), accept X as the simulated
value we want. If the acceptance condition is not satisfied, try again enough
times until it holds. Then take that X for which Y < f (X) as the output of the
algorithm and call it X. This procedure is referred to as the uniform rejection
method for density f (x).
Proposition 4.2 (Uniform rejection method) The random variable X produced by the uniform rejection method for density f (x) has probability distribution function f (x).
Proof. Let A represent the region in [a, b] [0, L] consisting of points (x, y)
such that y < f (x). We call A the acceptance region. As above, we denote by
(X, Y ) a random variable uniformly distributed on [a, b] [0, L]. Let F (x) =
P (X x) denote
R x the cumulative distribution function of X. We wish to show
that F (x) = a f (s)ds. This is a consequence of the following calculation, which
uses the continuous version of the total probability formula and the key fact:
P ((X, Y ) A|X = s) = f (s)/L.
F (x) = P (X x)
= P (X x|(X, Y ) A)
P ({X x} {(X, Y ) A})
P ((X, Y ) A)
Rb
1
P ({X x} {(X, Y ) A}|X = s)ds
= ba a
Rb
1
ba a P ((X, Y ) A|X = s)ds
Rx
P ((X, Y ) A|X = s)ds
= Rab
P ((X, Y ) A|X = s)ds
a
R x f (s)
L ds
= Rab f (s)
ds
Z ax L
=
f (s)ds.
=
It is clear that the efficiency of the rejection method will depend on the
probability that a random point (X, Y ) will be accepted, i.e., will fall in the
17
dx=xout(2)-xout(1);
fdx=(1/2)*sin(xout)*dx;
plot(xout,fdx)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0.5
1.5
2.5
4.6
One limitation of the uniform rejection method is the requirement that the PDF
f (x) be 0 on the complement of a finite interval [a, b]. A more general procedure,
called the envelope method for f (x) can sometimes be used when the uniform
rejection method does not apply.
Suppose that we wish to simulate a random variable with PDF f (x) and
that we already know how to simulate a second random variable Y with PDF
g(x) having the property that f (x) ag(x) for some positive a and all x. Note
that a 1 since the total integral of both f (x) and g(x) is 1. Now consider
the following algorithm. Draw a realization of Y with the distribution density
g(y) and then draw a realization of U with the uniform distribution U (0, ag(y)).
Repeat the procedure until a pair (Y, U ) such that U < f (Y ) is obtained. Then
set X equal to the obtained value of Y . In other words, simulate a value from the
distribution g(y) and accept this value with probability f (y)/(ag(y)), otherwise
reject and try again.
The method will work more efficiently if the acceptance rate is high. The
overall acceptance probability is P (U < f (Y )). It is not difficult to calculate
this probability as we did in the case of the uniform rejection method. (Simply
19
apply the integral form of the total probability formula.) The result is
P (accept) =
1
.
a
Proposition 4.3 (The envelope method) The envelope method for f (x) described above simulates a random variable X with probability distribution f (x).
Proof. The argument is essentially the same as for the uniform rejection method.
Note now that P (U f (Y )|Y = s) = f (s)/(ag(s)). With this in mind, we have:
F (x) = P (X x)
= P (Y x|U f (Y ))
P ({Y x} {U f (Y )})
P (U f (Y ))
R
P ({Y x} {U f (Y )}|Y = s)g(s)ds
= R
P (U f (Y )|Y = s)g(s)ds
Rx
P (U f (Y )|Y = s)g(s)ds
= R
P (U f (Y )|Y = s)g(s)ds
R x f (s)
a ds
= R
f (s)
ds
a
Z x
=
f (s)ds.
=
Example 4.5 (Envelope method) This is the same as the previous example,
but we now approach the problem via the envelope method. We wish to simulate
a random variable X with PDF (1/2) sin(x) over [0, ]. We first simulate a
random variable Y with probability density g(y), where
(
4
if y [0, /2]
2y
g(y) = 4
(
y)
if
y [/2, ].
2
Notice that f (x) ag(x) for a = 2 /8. Therefore, the envelope method will
have probability of acceptance 1/a = 0.81. To simulate the random variable Y ,
note that g(x) = (h h)(x), where h(x) = 2/ over [0, /2]. Therefore, we can
take Y = V1 + V2 , where Vi are identically distributed uniform random variables
over [0, /2].
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function x=samplefromsine2(n)
20
We study here a number of the more commonly occurring probability distributions. They are associated with basic types of random experiments that often
serve as building blocks for more complicated probability models. Among the
most important for our later study are the normal, the exponential, and the
Poisson distributions.
5.1
1
, k S.
k
k
, k S.
n
n
X
k
n
k=1
n(n + 1)
2n
n+1
=
.
2
21
n2 1
.
12
5.2
Given a positive integer n and an integer k between 0 and n, recall that the
binomial coefficient is defined by
n!
n
.
C(n, k) =
=
k
k!(n k)!
It gives the number of ways to pick k elements in a set of n elements. We often
read C(n, k) as n choose k.
The binomial distribution is the distribution of the number of successful
outcomes in a series of n independent trials, each with a probability p of success and 1 p of failure. If the total number of successes is denoted X, we
write
X B(n, p)
to indicate that X is a binomial random variable for n independent trials and
success probability p. Thus, if Z1 , . . . , Zn are independent random variables
taking values in {0, 1}, and P (Zi = 0) = 1, P (Zi = 0) = 1 p, then
X = Z1 + + Zn
is a B(n, p) random variable.
The sample space for a binomial random variable X is S = {0, 1, 2, . . . , n}.
The probability of k successes followed by n k failures is pk (1 p)nk . Indeed,
this is the probability of any sequence of n outcomes with k success trials,
independent of the order in which they occur. There are C(n, k) such sequences,
so the probability of k successes is
n
P (X = k) =
pk (1 p)nk .
k
The expectation and variance of the binomial distribution are easily obtained:
E[X] = np, Var(X) = np(1 p).
22
Example 5.1 (Urn problem) An urn contains N balls, of which K are black
and N K are red. We draw with replacement n balls and count the number
X of black balls drawn. Let p = K/N . Then X B(n, p).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=binomial(n,p,m)
%Simulates drawing m independent samples of a
%binomial random variable B(n,p).
y=sum(rand(n,m)<=p);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
5.3
n!
pk1 . . . pkr r .
k1 ! . . . kr ! 1
A=zeros(n,1);
for i=1:r
A=A+i*(a<=x & x<a+p(i));
a=a+p(i);
end
y=zeros(1,r);
for j=1:r
y(j)=sum(A==j);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
5.4
E[X] =
iP (X = i) =
i=1
i(1 p)i1 p =
i=1
1
p
= .
(1 (1 p))2
p
Similarly:
E[X 2 ] =
X
i=1
i2 P (X = i) =
i2 (1 p)i1 p = p
i=1
24
1 + (1 p)
2p
=
,
(1 (1 p))3
p2
1
1p
2p
2 =
.
p2
p
p2
X
i=1
ai1 =
1
,
1a
1
,
(1 a)2
iai1 =
i=1
i2 ai1 =
i=1
1+a
(1 a)3
i=
i=1
n(n + 1)
,
2
n
X
i2 =
i=1
1
n(n + 1)(2n + 1).
6
Example 5.3 (Waiting for a six) How long should we expect to have to wait
to get a 6 in a sequence of die tosses? Let X denote the number of tosses until
6 appears for the first time. Then the probability that X = k is
k1
5
1
P (X = k) =
.
6
6
In other words we have k 1 failures, each with probability 5/6, until a success,
with probability 1/6. The expected value of X is
X
k=1
kP (X = k) =
k1
X
5
1
k
= 6.
6
6
k=1
25
5.5
A random variable X has the negative binomial distribution, also called the
Pascal distribution, denoted X N B(n, p), if there exists an integer n 1 and
a real number p (0, 1) such that
n+k1
P (X = n + k) =
pn (1 p)k , k = 0, 1, 2, . . .
k
The negative binomial distribution has the following interpretation.
Proposition 5.1 Let X1 , . . . , Xn be independent Geom(p) random variables.
Then X = X1 + + Xn has the negative binomial distribution with parameters
n and p.
Therefore, to simulate a negative binomial random variable all we need is to
simulate n independent geometric random variables, then add them up. It also
follows from this proposition that
E[X] =
n
n(1 p)
, Var(X) =
.
p
p2
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=negbinomial(n,p)
%Simulates one draw of a negative binomial
%random variable with parameters n and p.
y=0;
for i=1:n
a=0;
u=0;
while a==0
u=u+1;
a=(rand<p);
end
y=y+u;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
5.6
k
e , k = 0, 1, 2, . . . .
k!
26
P (X = k) =
1
n
n
k
nk
n!
=
1
k!(n k)! n
n
n
k
(1 /n)n
k
n!
k
k! (n k)!n (1 /n)k
k n (n 1) (n 2)
(n k + 1) (1 /n)n
=
k! n n
n
n
(1 /n)k
k
e .
k!
=
27
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
28
5.7
N n
.
N 1
29
5.8
A random variable X has a uniform distribution over the range [a, b], written
X U (a, b), if the PDF is given by
(
1
if a x b
fX (x) = ba
0
otherwise.
If x [a, b], then
Z
FX (x) =
fX (y)dy
Zx
fX (y)dy
=
a
=
Therefore
FX (x) =
xa
.
ba
xa
ba
if x < a
if a x b
if x > b.
5.9
(
0
fX (x) =
ex
if x < 0
if x 0.
1
1
, Var(X) = 2 .
30
1 (1 e(s+t) )
1 (1 et )
= es
= 1 (1 es )
= 1 FX (s)
= 1 P (X s)
= P (X > s).
The next proposition states that the inter-event times for a Poisson random
variable with parameter are exponentially distributed with parameter .
Proposition 5.3 Consider a Poisson process with rate . Let T be the time to
the first event (after 0). Then T Exp().
31
Proof. Let Nt be the number of events in the interval (0, t] (for given fixed
t > 0). Then Nt Po(t). Consider the cumulative distribution function of T :
FT (t) = P (T t)
= 1 P (T > t)
= 1 P (Nt = 0)
(t)0 et
0!
= 1 et .
=1
h
h
h
h
as h 0. So for very small h, P (T h) is approximately h and due to the
independence property of the Poisson process, this is the probability for any time
interval of length h. The Poisson process can therefore be thought of as a process
with constant event hazard , where the hazard is essentially a measure of
event density on the time axis. The exponential distribution with parameter
can therefore also be reinterpreted as the time to an event of constant hazard
.
The next proposition describes the distribution of the minimum of a collection of independent exponential random variables.
Proposition 5.4 Let Xi Exp(i ), i = 1, 2, . . . , n, be independent random
variables, and define X0 = min{X1 , X2 , . . . , Xn }. Then X0 Exp(0 ), where
0 = 1 + 2 + + n .
32
i=1
n
Y
ei x
i=1
= ex(1 ++n )
= e0 x .
Proposition 5.5 Suppose that X Exp() and Y Exp() are independent
random variables. Then P (X < Y ) = /( + ).
Proof.
Z
P (X < Y ) =
Z0
=
(1 ey )ey dy
.
+
The next result gives the likelihood of a particular exponential random variable of an independent collection being the smallest.
Proposition 5.6 Let Xi Exp(i ), i = 1, 2, . . . , n be independent random
variables and let J be the index of the smallest of the Xi . Then J is a discrete
random variable with probability mass function
P (J = i) =
i
, i = 1, 2, . . . , n,
0
where 0 = 1 + + n .
Proof. For each j, define the random variable Y = mink6=j {Xk } and set j =
33
0 j . Then
P (J = j) = P (Xj < mink6=j {Xk })
= P (Xj < Y )
j
=
j + j
j
=
.
0
From the formula for a linear transformation of a random variable we immediately have:
Proposition 5.7 Let X Exp(). Then for > 0, Y = X has distribution
Y Exp(/).
5.10
(x)n1 x
e
.
(n 1)!
5.11
2
for < x < and > 0.
34
Note that the PDF is symmetric about x = , so the median and mean of
the distribution will be . Checking that the density integrates to 1 requires
the well-known integral
r
Z
x2
e
dx =
, > 0.
We leave the calculation of this and the variance as an exercise. The result is
E[X] = , Var(X) = 2 .
The random variable Z is said to have the standard normal distribution if
Z N (0, 1). Therefore, the density of Z, which is usually denoted (z), is given
by
1 2
1
(z) = exp z
2
2
for < z < . The cumulative distribution function of a standard normal
random variable is denoted (z), and is given by
Z z
(z) =
(x)dx.
X
N (0, 1).
and so the cumulative probabilities for any normal random variable can be
calculated using the tables for the standard normal distribution.
The sum of normal random variables is also a normal random variable. This
is shown in the following proposition.
Proposition 5.8 If X1 N (1 , 12 ) and X2 N (2 , 22 ) are independent normal random variables, then Y = X1 + X2 is also normal and
Y N (1 + 2 , 12 + 22 ).
The elementary proof will be left as an exercise.
Therefore, any linear combination of independent normal random variables is
also a normal random variable. The mean and variance of the resulting random
variable can then be calculated from the proposition.
35
5.12
A more efficient procedure for simulating normal random variables is the socalled Box-Muller method. This consists of first simulating a uniform and an
exponential random variable independently:
U (0, 2), and R2 Exp(1/2).
Then
X1 = R cos(),
X2 = R sin()
are two independent standard normal random variables. The following proposition is needed to justify this claim.
Proposition 5.9 Let X1 and X2 be random variables with values in R. Let R
and be the radius and angle expressing the vector valued random variable X =
(X1 , X2 ) in polar coordinates. Then X1 , X2 are independent standard normal
random variables if and only if R2 and are independent with R2 Exp(1/2)
and U (0, 2).
Proof. The PDFs of (X1 , X2 ) and of (R, ) are related by the change of coordinate
x1 = r cos(), x2 = r sin().
First assume that X1 and X2 are independent standard normal random variables. By independence, the PDF of the vector random variable X is the product
of the respective PDFs
1 21 (x21 +x22 )
fX (x1 , x2 ) = f1 (x1 )f2 (x2 ) =
e
.
2
Using the general change of coordinate formula, we obtain
(x1 , x2 )
f(R,) (r, ) = fX1 ,X2 (x1 , x2 )
(r, )
1 r2 /2
cos r sin
=
e
det
sin
r cos
2
2
1
=
rer /2 .
2
36
1 1u
e 2 .
2
This shows that R2 and are as claimed. The converse is shown similarly.
3
5
37
5.13
The 2n distribution
n
X
Zi
i=1
has a 2n distribution.
5.14
It is not difficult to show from the definition that (1) = 1 and (x+1) = x(x).
If x = n isa positive integer, it follows that (n + 1) = n!. Also worth noting,
(1/2) = .
A random variable X has a gamma distribution with parameters , .0,
written X (, ), if it has PDF
(
x1 ex if x > 0
f (x) = ()
0
if x 0.
Note that (1, ) = Exp(), so the gamma distribution is a generalization
of the exponential distribution.
It is also not difficult to show that if X (, ), then
E[X] =
, Var(X) = 2 .
+1
=
x ex dx
0 ( + 1)
= .
Figure 8 shows the graph of the PDF function for (4, 1).
We note the following property of gamma random variables. The proof is
left as an exercise.
38
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
10
15
39
5.15
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=beta(a,b,n)
%Simulates n independent realizations of a Beta(a,b)
%random variable.
x1=gamma(a,n);
x2=gamma(b,n);
y=x1./(x1+x2);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
6.1
Convolution of PDFs
Exercise 6.8 Let f (x) and g(x) be two functions of a real variable x R.
Suppose that f (x) is zero for x in the complement of the interval [a, b], and
g(x) is zero in the complement of [c, d]. Show that (f g)(x) is zero in the
complement of [a + c, b + d]. Hint: show that the convolution of the indicator
functions of the first two intervals is zero in the complement of the third.
Exercise 6.9 Suppose that f (x) and g(x) are two functions of a real variable
x R which are zero outside of the intervals [a, b] and [c, d], respectively. We
wish to obtain an approximation formula for the convolution h = f g of f and
g by discretizing the convolution integral. Assume that the lengths of the two
intervals are multiples of a common small positive step size e. This means that
there are positive integers N and M such that
dc
ba
=e=
.
N
M
Show that the approximation of (f g)(x) over the interval [a + c, b + d] by
Riemann sum discretization of the convolution integral is
min{j,M +1}
h(xj ) =
f (a + (j i)e)g(c + (i 1)e)e,
i=max{1,jN }
for j = 1, . . . , N + M + 1.
The following script implements the approximation for the convolution integral to obtain the nth convolution power of a function f .
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function g=convolution(f,a,b,n)
%Input - f vector discretization of a function
%
- a and b are the left and right endpoints
%
of an interval outside of which f is zero
%
- n degree of convolution
%Output - h vector approximating of the n degree
%
convolution of f with itself
%
over interval [na, nb]
N=length(f)-1;
e=(b-a)/N;
s=[a:e:b];
g=f;
for k=2:n
x=[k*a:e:k*b];
h=zeros(size(x));
for j=1:k*N+1
for i=max([j-N,1]):min([j,(k-1)*N+1])
41
h(j)=h(j)+f(j-i+1)*g(i)*e;
end
end
g=h;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Exercise 6.10 Let f be the function f (x) = cx2 over the interval [1, 1], where
c is a normalization constant. We discretize it and write in Matlab:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
x=[-1:0.01:1]; f=x.^2; f=f/sum(f);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
To find the convolution power f n = f f of degree n, we invoke the
function convolution defined above. This is done with the command
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
g=convolution(f,-1,1,n);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
When plotting g, keep in mind that it may be non-zero over the bigger interval
[na, nb]. So write now x=[-n:0.01:n]; plot(x,g). Draw the graphs of f and
the convolution powers of degree n = 2, 5 and 20.
6.2
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
rand(seed,121)
p=ones(1,6)/6; %Probabilities of the outcomes of tossing a die
N=1000;
%Number of tosses in each trial
M=500;
%Number of trials
y=[];
for i=1:M
y=[y sum(samplefromp(p,N))];
end
[n,xout]=hist(y,20); %n is the count in each bin, and xout
%is a vector giving the bin locations
hold off
stem(xout,n/M)
f=(1/(s*sqrt(2*pi)))*exp(-0.5*((x-m)/s).^2)*dx;
hold on
plot(x,f)
grid
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Exercise 6.11 (Transformation method) Write a program to simulate a
random variable X with probability density function f (x) = 3x2 over the interval [0, 1] using the transformation method. Simulate 1000 realizations of X and
plot a stem plot with 20 bins. On the same coordinate system, superpose the
graph of f (x) (appropriately normalized so as to give be correct frequencies of
each bin.)
Exercise 6.12 (Uniform rejection method) Write a program to simulate a
random variable taking values in [1, 1] with probability density
f (x) =
3
(1 x2 ).
4
Use the uniform rejection method. Simulate 1000 realizations of X and plot a
stem plot with 20 bins. On the same coordinate system, superpose the graph of
f (x) (appropriately normalized so as to give be correct frequencies of each bin.)
6.3
Normal distributions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function y=normalpdf(x,mu,sigma)
%Input - mu, sigma: parameters of normal distribution
%
- x a real number
%Output - value of the pdf at x
y=exp(-0.5*(x-mu).^2)/(sigma*sqrt(2*pi));
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
n
k
pk (1 p)nk , k = 0, 1, . . . , n.
Multinomial
P (n1 , n2 , . . . , nk ) =
P
k
k
( ni )! Y ni X
pi ,
pi = 1.
n1 !n2 ! . . . nk ! i=1
i=1
Geometric
P (X = k) = (1 p)k1 p, k = 1, 2, . . . , n.
43
Poisson
P (X = k) =
Uniform
f (x) =
k e
, k = 0, 1, 2, . . .
k!
1
I[a,b] (x), x R.
ba
Negative binomial
P (X = n + k) =
n+k1
k
pn (1 p)k , k = 0, 1, 2, . . .
Exponential
f (x) = ex , x 0.
Gamma
f (x) =
n xn1 ex
, x 0.
(n)
x/21 ex/2
, x 0.
(/2)2/2
t
(( + 1)/2)
f (x) =
(/2)
1+
x2
, x R.
Cauchy
f (x) =
1
, x R.
2 + (x )2
Weibull
1 x 2
1
e 2 ( ) , x R.
2
Half-normal
1
f (x) =
2
2 21 ( x
) ,
e
x 0.
Multivariate normal
1
f (x) = (2)k/2 | det(C)|1/2 exp (x )0 C 1 (x ) , x Rk .
2
44
Logistic
f (x) =
e(+x)
, x R.
(1 + e(+x) )2
f (x) =
e x(+1)
, x 0.
(1 + e x )2
Log-logistic
References
[CKO] S. Cyganowski, P. Kloeden, and J. Ombach. From Elementary Probability to Stochastic Differential Equations with MAPLE, Springer, 2002.
[Mor] Byron J.T. Morgan. Applied stochastic modelling, Arnold, 2000.
[Rice] J.A. Rice. Mathematical Statistics and Data Analysis. Wadsworth &
Brooks/Cole, 1988.
[Wilk] Darren J. Wilkinson. Stochastic Modelling for Systems Biology, Chapman and Hall/CRC, 2006.
45