You are on page 1of 26

3

Discrete Random Variables and Probability Distributions

Contents
3.1

Discrete Random Variables ....................................................................................................................... 1

3.2

Probability Distributions and Probability Mass Functions (pmf) ............................................. 2

3.3

Cumulative Distribution Functions........................................................................................................ 5

3.4

Mean and Variance of a Discrete Random Variable ........................................................................ 7

3.6

Binomial Distribution ............................................................................................................................... 10

3.7

Geometric and Negative Binomial Distributions .......................................................................... 15

3.8

Hyper Geometric Distribution .............................................................................................................. 22

3.9

Poisson Distribution ................................................................................................................................. 24

3.1

Discrete Random Variables

Many physical systems can be modeled by the same or similar random experiments and
random variables. The distribution of the random variables involved in each of these
common systems can be analyzed, and the results of that analysis can be used in
different applications and examples. In this chapter, we present the analysis of several
random experiments and discrete random variables that frequently arise in
applications. We often omit a discussion of the underlying sample space of the random
experiment and directly describe the distribution of a particular random variable.
Example 3.1.1
For each of the following exercises, determine the range (possible values) of the random
variable.
(i)

The random variable is the number of nonconforming solder connections on


a printed circuit board with 1000 connections.

Solution: The range of X is {0,1,2,...,1000}


(ii)

A batch of 500 machined parts contains 10 that do not conform to customer


requirements. Parts are selected successively, without replacement, until a
nonconforming part is obtained. The random variable is the number of parts
selected.

Solution: The range of X is {1,2,...,491}. Because 490 parts are conforming, a


nonconforming part must be selected in 491 selections.

3.2

Probability Distributions and Probability Mass Functions (pmf)

Random variables are so important in random experiments that sometimes we


essentially ignore the original sample space of the experiment and focus on the
probability distribution of the random variable.
The probability distribution of a random variable X is a description of the
probabilities associated with the possible values of X. For a discrete random variable,
the distribution is often specified by just a list of the possible values along with the
probability of each.
Example 3.2.1
The possible values for X are {0, 1, 2 } and the probabilities are
f (0) = P(X = 0) = 0.35

f(1)= P(X = 1) = 0.24

f(2) = P(X = 2) = 0.41

The probability distribution of X is specified by the possible values along with the
probability of each. A graphical description of the probability distribution of X is shown
in Fig. 3-1.
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0

Fig. 3-1 Probability distribution


Definition
For a discrete random variable X with possible values x1, x2, . . . , xn, a probability mass
function is a function such that
(a)

f ( xi ) 0
n

(b)
(c)

f (x ) 1
i 1

f ( xi ) P( X xi )

(3-1)

Example 3.2.2
An optical inspection system is to distinguish among different part types. The
probability of a correct classification of any part is 0.95. Suppose that three parts are
inspected and that the classifications are independent. Let the random variable X denote
the number of parts that are correctly classified.
(a)

Determine the probability mass function of X.

(b)

Use the probability mass function to determine the following probabilities:


i.
ii.
iii.
iv.
v.

(c)

P (X = 2)
P (0.5 < X < 2.5)
P (X > 2)
P (0 X < 2)
P (X = 0 or X = 2)

Verify that the following functions are probability mass functions, and determine
the requested probabilities.

Solution
(a)

P ( X 0) (0.05)3 0.000125
P ( X 1) 3(0.95)(0.05)(0.05) 0.007125
P ( X 2) 3(0.95)(0.95)(0.05) 0.135375
P ( X 3) (0.95)3 0.0.857375
The probability mass function of X.
x
P (X = x )

0
0.000125

(b)
i.
ii.
iii.
iv.
v.
(c)

P (X = 2) =
P (0.5 < X < 2.5) =
P (X > 2) =
P (0 X < 2) =
P (X = 0 or X = 2) =

1
0.007125

2
0.135375

3
0.857375

Example 3.2.3
x

Given pmf as

8 1
f ( x) ,
7 2

x 1, 2, 3 .

Find

(a) P (X 1)

(b) P (X > 1)

(c) P (2< X < 6)

(d) P (X 1 or X > 1)

Solution

(a)

P (X

1) = P(X =1)
1

8 1
= 0.5714
7 2

(b)

P (X > 1) = P(X =2) + P(X =3)


8 1
=

7 2

8 1

7 2

= 0.2857 + 0.1429
OR:

= 0.4286

P (X >1) = 1 P(X =1) =1 0.5714 0.4286


3

(c)

8 1
P (2 < X < 6)= P(X=3) = 0.1429
7 2

(d)

P(X 1 or X>1) = P(X =1) + P(X =2) + P(X =3)


1

8 1
8 1
8 1
=
7 2
7 2
7 2
=1

3.3

Cumulative Distribution Functions

Definition
The cumulative distribution function of a discrete random variable X, denoted as F(x), is

F ( x) P( X x) =

xi x

f ( xi )

For a discrete random variable X, F(x) satisfies the following properties.

(i )

F ( x) P( X x)

f (x )

xi x

(ii ) 0 F ( x) 1
(iii ) If x y, then F ( x) F ( y )

(3-2)

Example 3.3.1
Determine the probability mass function of X from the following cumulative distribution
function, and plot F(x)

Solution
f (2) = P (X = 2)
= P(X 2) P( X < 2)
= 0.2 0 = 0.2

f (0) = P(X = 0)
= P(X 0) P(X 2)
= 0.7 0.2 = 0.5

Plot F(x) : Cumulative distribution function for Example 3.3.1

f (2) = P(X = 2)
= 1 P(X < 2)
= 1 0.7 = 0.3

Example 3.3.2
Determine the cumulative distribution function for the random variable in Table 1; also
determine the following probabilities:
Table 1:

(a)
(b)
(c)
(d)

f (x)

1
8

2
8

2
8

2
8

1
8

P(X 1.25)
P(X 2.2)
P ( 1.1 X 1)
P(X > 0)

Solution
Cumulative distribution function:

3.4

Mean and Variance of a Discrete Random Variable

Two numbers (mean and variance) are often used to summarize a probability
distribution for a random variable X.
The mean is a measure of the center or middle of the probability distribution and the
variance is a measure of the dispersion, or variability in the distribution.
Definition
The mean or expected value of the discrete random variable X, denoted as or E(X), is

E( X ) =

xf ( x)

(3-3)

The variance of X, denoted as 2 or V(X) or Var(X), is

2 V (X )
E ( X )2

(x )

f ( x)

f ( x) 2

The standard deviation of X is =

Proof:

Fig 3-4(a): Illustration of probability distributions with equal means but different
variances.

A probability distribution can be viewed as a loading with the mean equal to the balance
point. Parts (a) and (b) illustrate equal means, but Part (a) illustrates a larger variance.

Fig 3-4(b): Illustration of probability distributions with equal means variances.

The probability distributions illustrated in Parts (a) and (b) differ even though they
have equal means and equal variances.

Example 3.4.1
There is a chance that a bit transmitted through a digital transmission channel is
received in error. Let X equal the number of bits in error in the next four bits
transmitted. The possible values for X are {0, 1, 2, 3, 4}. Based on a model for the errors
that is presented in the following section, probabilities for these values will be
determined. Suppose that the probabilities are
x

f ( x)

0.6561

0.2916

0.0486

0.0036

0.0001

Now

E( X ) =

xf ( x)
x

= 0 f (0) 1 f (1) 2 f (2) 3 f (3) 4 f (4)


= 0(0.6561) 1(0.2916) 2(0.0486) 3(0.0036) 4(0.0001)
0.4

2 V ( X ) ( x) 2 f ( x) 2

= 02 f (0) 12 f (1) 22 f (2) 32 f (3) 4 2 f (4) 2


= 0(0.6561) 1(0.2916) 4(0.0486) 9(0.0036) 16(0.0001) (0.4) 2
0.36

Exercise 3.4.2
The number of messages sent per hour over a computer network has the
following distribution:
x = number of
messages

f ( x)

10

11

12

13

14

14

0.08

0.15

0.30

0.20

0.20

0.07

Determine the mean and standard deviation of the number of messages sent per hour
Solution

Example 3.4.3
The distributor of a machine for cytogenics has developed a new model. The company
estimates that when it is introduced into the market, it will be very successful with a
probability 0.6, moderately successful with a probability 0.3, and not successful with
probability 0.1. The estimated yearly profit associated with the model being very
successful is $15 million and being moderately successful is $5 million; not successful
would result in a loss of $500,000. Let X be the yearly profit of the new model.
(a)
(b)

Determine the probability mass function of X.


Determine the mean and variance of the random variable.

Solution

3.6 Binomial Distribution


Examples of Binomial distribution
Consider the following random experiments and random variables:
1.

Flip a coin 10 times. Let X = number of heads obtained.

2.

Aworn machine tool produces 1% defective parts. Let X = number of defective


parts in the next 25 parts produced.

3.

Each sample of air has a 10% chance of containing a particular rare molecule. Let
X = the number of air samples that contain the rare molecule in the next 18
samples analyzed.

4.

Of all bits transmitted through a digital transmission channel, 10% are received
in error. Let X = the number of bits in error in the next five bits transmitted.

5.

A multiple choice test contains 10 questions, each with four choices, and you
guess at each question. Let X = the number of questions answered correctly.

6.

In the next 20 births at a hospital, let X = the number of female births.

7.

Of all patients suffering a particular illness, 35% experience improvement from a


particular medication. In the next 100 patients administered the medication, let X
= the number of patients who experience improvement.

These examples illustrate that a general probability model that includes these
experiments as particular cases would be very useful.

Each of these random experiments can be thought of as consisting of a series of


repeated, random trials: such as 10 flips of the coin and etc.

The random variable in each case is a count of the number of trials that meet a specified
criterion. The outcome from each trial either meets the criterion that X counts or it does
not; consequently, each trial can be summarized as resulting in either a success or a
failure.

For example:
In experiment 2, because X counts defective parts, the production of a defective part
is called a success.

A trial with only two possible outcomes is used so frequently as a building block of a
random experiment that it is called a Bernoulli trial. It is usually assumed that the
trials that constitute the random experiment are independent. This implies that the
outcome from one trial has no effect on the outcome to be obtained from any other trial.

Furthermore, it is often reasonable to assume that the probability of a success in each


trial is constant. In the multiple choice experiment (4 choice), if the test taker has no
knowledge of the material and just guesses at each question, we might assume that the
probability of a correct answer is for each question.

Definition
A random experiment consists of n Bernoulli trials such that
(1) The trials are independent
(2) Each trial results in only two possible outcomes, labeled as success and failure
(3) The probability of a success in each trial, denoted as p, remains constant.

The random variable X that equals the number of trials that result in a success has a
binomial random variable with parameters 0 < p < 1 and n = 1, 2, 3, ... . The probability
mass function of X is

f (x) = ( )p x (1 p) n x

x = 0, 1, 2, . . . , n

Example 3.6.1
The random variable X has a binomial distribution with n = 10 and p = 0.3. Sketch the
probability mass function of X.
(a) What value of X is most likely? E(x)
(b) What value(s) of X is(are) least likely?
Solution

P(X = x) = ( )p x (1 p) n x = ( n Cx ) p x (1 p) n x

Example 3.6.2
The random variable X has a binomial distribution with n = 10 and p = 0.5. Determine
the following probabilities.
(a)
(b)

P(X = 5)
P(X 9)

(c)
(d)

P(X 2)
P(3 X < 5)

Solution

Example 3.6.3
Determine the cumulative distribution function of a binomial random variable with n =3
and p = 0.01
Solution

The probability expression above is a very useful formula that can be applied in a
number of examples. The name of the distribution is obtained from the binomial
expansion. For constants a and b, the binomial expansion is

( a b) n

n k n k

a b
k 0 k
n

Let p denote the probability of success on a single trial. Then, by using the binomial
expansion with a = p and b = 1 p, we see that the sum of the probabilities for a
binomial random variable is 1. Furthermore, because each trial in the experiment is
classified into two outcomes, {success, failure}, the distribution is called a bi-nomial.
For a fixed n, the distribution becomes more symmetric as p increases from 0 to 0.5 or
decreases from 1 to 0.5. For a fixed p, the distribution becomes more symmetric as n
increases.

Binomial distributions for selected values of n and p


The mean and variance of a binomial random variable depend only on the parameters
p and n.

Definition
If X is a binomial random variable with parameters p and n,

E ( x) np and 2 V ( X ) np(1 p)
(3-8)

Example 3.6.4
Batches that consist of 50 coil springs from a production process are checked for
conformance to customer requirements. The mean number of nonconforming coil
springs in a batch is 5. Assume that the number of nonconforming springs in a batch,
denoted as X, is a binomial random variable.
(a) What are n and p?
(b) What is P( X 2) ?
(c) What is P( X 49) ?
(d) What is P( X 48) ?
Solution
.

Example 3.6.5
Determine the cumulative distribution function of a binomial random variable
with n = 5 and p = 0.3. Find mean and variance of a binomial random variable.
Solution

Example 3.6.6
A statistical process control chart example. Samples of 20 parts from a metal punching
process are selected every hour. Typically, 1% of the parts require rework. Let X
denotes the number of parts in the sample of 20 that require rework. A process problem
is suspected if X exceeds its mean by more than three standard deviations.
(a) If the percentage of parts that require rework remains at 1%, what is the
probability that X exceeds its mean by more than three standard deviations?
(b) If the rework percentage increases to 4%, what is the probability that X exceeds 1?
(c)

If the rework percentage increases to 4%, what is the probability that X exceeds 1
in at least one of the next five hours of samples?

Solution

3.7

Geometric and Negative Binomial Distributions

3-7.1 Geometric Distribution


Assume a series of Bernoulli trials (independent trials with constant probability p of a
success on each trial). However, instead of a fixed number of trials, trials are conducted
until a success is obtained.
Let the random variable X denote the number of trials until the first success.

A geometric random variable has been defined as the number of trials until the first
success.
Definition
In a series of Bernoulli trials (independent trials with constant probability p of a
success), let the random variable X denote the number of trials until the first success.
Then X is a geometric random variable with parameter 0 < p < 1 and
f ( x) = (1 p) x 1 p

x = 1, 2, 3. . .
(3-9)

Below is an examples of the probability mass functions for geometric random variables.
The probabilities decrease in a geometric progression

Geometric
distributions for
selected values of
the parameter p.

If X is a geometric random variable with parameter p, then

E( X )

1
p

and

2 V (X )

(1 p)
p2

Example 3.6.7
The probability that a bit transmitted through a digital transmission channel is received
in error is 0.1. Assume the transmissions are independent events, and let the random
variable X denote the number of bits transmitted until the first error. Find P(X = 5),
mean and variance.
Solution

For P(X = 5) means the probability that the first four bits are transmitted correctly and
the fifth bit is in error. Means:
Correct
(0.9)

Correct
(0.9)

Correct
(0.9)

Correct
(0.9)

Error
(0.1)

Because the trials are independent and the probability of a correct transmission is 0.9,
P(X=5) = (0.9)4 (0.1)1 = 0.066
The mean number of bits until the next error is

E( X )

1
1
=
=10
p
0.1

(the same result as the mean number of bits until the

first error)

2 V (X )

and

(1 0.1)
90
0.12

Example 3.6.8
The probability of a successful optical alignment in the assembly of an optical data
storage product is 0.8. Assume the trials are independent.
(a) What is the probability that the first successful alignment requires exactly four
trials?
(b) What is the probability that the first successful alignment requires at most four
trials?
(c)

What is the probability that the first successful alignment requires at least four
trials?

Solution
Let X denote the number of trials to obtain the first successful alignment. Then X is a
geometric random variable with p = 0.8
(a)

P (X = 4)

(b)

P (X 4) = P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4)

(c)

P (X 4) =

Exercise
An old gas water heater has a pilot light which much be lit manually, using a match. The
probability of successfully lighting the pilot light on any given attempt is 82%.
(a)

Compute the probability that it takes more than four tries to light the pilot light.

(b)

Compute the probability that the pilot light is lit on the 5th try.

(c)

Compute P 4 X 8 .

Solution

3-7.2 Negative Binomial Distribution


A generalization of a geometric distribution in which the random variable is the number
of Bernoulli trials required to obtain r successes results in the negative binomial
distribution.

Definition

Mean and variance of Negative Binomial Distribution

Example 3.7.1
Suppose the probability is 0.8 that any given student will understand a lecture about the
negative binomial distribution. What is the probability that
a)

the sixth student to hear this lecture is the fourth one to understand?

b)

the sixth student to hear this lecture is the first one to understand it?

c)

the third student to hear this lecture is the first one to understand it?

d)

the third student to hear this lecture is the second one to understand it?

Solution

Because at least r trials are required to obtain r successes, the range of X is from r to .
In the special case that r = 1, a negative binomial random variable is a geometric
random variable.
Selected negative binomial distributions are illustrated in following figure.

Negative binomial distributions for selected values of the parameters r and p.


Example 3.7.2
The probability of a successful optical alignment in the assembly of an optical data
storage product is 0.8. Assume the trials are independent.
(a)

What is the probability that the first successful alignment requires exactly four
trials?

(b)

What is the probability that the first successful alignment requires at most four
trials?

(c)

What is the probability that the first successful alignment requires at least four
trials?

Solution
Let X denote the number of trials to obtain the first successful alignment. Then X is a
geometric random variable with p = 0.8
(a)

(b)

(c)

Notes:
A binomial random variable is a count of the number of successes in n
Bernoulli trials. That is, the number of trials is predetermined, and the
number of successes is random.
A negative binomial random variable is a count of the number of trials
required to obtain r successes. That is, the number of successes is
predetermined, and the number of trials is random.
In this sense, a negative binomial random variable can be considered the opposite /
negative of a binomial random variable.

Example 3.7.3
A fault-tolerant system that processes transactions for a financial services firm uses
three separate computers. If the operating computer fails, one of the two spares can be
immediately switched online. After the second computer fails, the last computer can be
immediately switched online. Assume that the probability of a failure during any
transaction is and that the transactions can be considered to be independent events.
(a) What is the mean number of transactions before all computers have failed?
(b) What is the variance of the number of transactions before all computers have failed?

Solution
Let X denotes the number of transactions until all computers have failed. Then, X is
negative binomial random variable with p = 108 and r = 3.
(a)

E( X )

r
3

p
108

3 x108

3(1 108 )

3 x1016
(a) V ( X )
16
10

3.8

Hyper Geometric Distribution


N

K
N-K

A set of N objects contains K objects classified as successes and N K objects classified


as failures. A sample of size n objects is selected randomly (without replacement) from
the N objects, where K N and n N .
Let the random variable X denotes the number of successes in the sample. Then X is a
hypergeometric random variable and

( = ) = () =


( )(
)

( )

where x max{0, n K N} to min{K , n}


The expression min{K , n} is used in the definition of the range of X because the
maximum number of successes that can occur in the sample is the smaller of the sample
size, n, and the number of successes available, K. Also, if n + K > N at n + K N least
successes must occur in the sample.
If X is a hypergeometric random variable with parameters N , K and n, then
= E(X) = np

and 2 = V(X) = np(1 p)( 1)

where p = and ( 1) is finite population correction factor.


Example 3.8.1
Suppose X has a hypergeometric distribution with N = 20, n = 4, and K =4. Determine the
following:
(a)

P( X = 1 )

(b) P( X = 4)

(d)

Determine the mean and variance of X.

Solution
Using:

( = ) = () =


( )(
)

( )

(c) P( x 2)

K objects can be categorised into more than one set. Example:

N
K1
K4
K2
K3

Example 3.8.2
A foreign student club lists as its members 2 Canadians, 3 Japanese, 5 Italians, and 2
Germans. If a committee of 4 is selected at random, find the probability that
(a)

all nationalities are represented;

(b)

all nationalities except the Italians are represented.

Example 3.8.3
The probability is 0.6 that a calibration of a transducer in an electronic instrument
conforms to specifications for the measurement system. Assume the calibration
attempts are independent. What is the probability that the first calibration meet the
specifications for the measurement system are exactly five attempts?

3.9

Poisson Distribution

Definition
Given an interval of real numbers, assume counts occur at random throughout the
interval. If the interval can be partitioned into subintervals of small enough length such
that
(1) the probability of more than one count in a subinterval is zero,
(2) the probability of one count in a subinterval is the same for all subintervals and
proportional to the length of the subinterval, and
(3) the count in each subinterval is independent of other subintervals, the random
experiment is called a Poisson process.

The mean and variance of a Poisson random variable are equal. For example, if particle
counts follow a Poisson distribution with a mean of 25 particles per square centimeter,
the variance is also 25 and the standard deviation of the counts is 5 per square
centimeter. Consequently, information on the variability is very easily obtained.
Conversely, if the variance of count data is much greater than the mean of the same
data, the Poisson distribution is not a good model for the distribution of the random
variable.
Example 3.9.1
Suppose X has a Poisson distribution with a mean of 4. Determine the following
probabilities:

Solution

Example 3.9.2
The number of failures of a testing instrument from contamination particles on the
product is a Poisson random variable with a mean of 0.02 failure per hour.
(a) What is the probability that the instrument does not fail in an 8-hour shift?
(b) What is the probability of at least one failure in a 24-hour day?
Solution

Example 3.9.3
Suppose that the number of customers that enter a bank in an hour is a Poisson
random variable, and suppose that P(X=0) = 0.05. Determine the mean and variance
of X.
Solution
P(X=0) = 0.05 P(X=0) =e = 0.05
= ln (0.05) = 2.996
E(X) = = 2.996
V(X) = = 2.996

Suggestion Exercises Chapter 3:


Topic

Exercises

3.1

3-3,

3.2

3-15, 3-16, 3-17,

3-24, 3-25

3.3

3-28, 3-29, 3-31,

3-32, 3-33, 3-36

3.4

3-38, 3-39, 3-40,

3-41, 3-45

3.6

3-58, 3-59, 3-62,

3-63, 3-68, 3-70,

3.7

3-73, 3-75, 3-77,

3-81, 3-82

3.8

3-86, 3-90, 3-92,

3-94

3.9

3-100, 3-101, 3-104, 3-105

S.E

3-108, 3-111, 3-113, 3-119, 3-123

3-4,

3-8,

3-9

You might also like