You are on page 1of 29

The Normal Distribution

=====================================
Review of Probability Distributions

Discrete Probability
Distributions

Binomial Poisson
The Binomial probability distribution :

P (X = r) = (n  / r ! (n-r) !) p r (1 – p ) n – r
for r = 0,1,2,…, n.
Where n ! = (n factorial) is n ! = n (n-1) (n-2) …(1);
0 ! = 1, by definition.
p may be known a priori or it may be estimated empirically
as p = r/n, the ratio of the number of successes observed in
the n trials to the number of trials.
Properties of the binomial distribution :
•Characterised by n and p.
•Mean = np
•Variance = np(1-p)
•Standard deviation =  np (1-p)
The Poisson probability distribution :
P(X = r) = e- r / r ! for r = 0, 1, 2, …,

where e = 2.718 is the base of natural (Naperian) logarithms


and  may be known a priori or it may be estimated from
sample by X (arithmetic mean).

Properties of the Poisson distribution :


•Characterised by  .
•Mean =  .
•Variance =  .
•Standard deviation =  .
•The Poisson distribution approximates the binomial distri-
bution when the number of trials, n, is large and the proba-
bility of success in a single trial, p, is small; with =np.
The Normal Distribution
The Normal distribution (Gaussian distribution) is a very
important probability distribution in statistics.
The distribution of many medical measurements in
populations approximate the normal in shape (e.g. serum
uric acid level, cholesterol levels, blood pressure, height,
and weight).

The important descriptive properties are :


1. It is a distribution of a continuous variable, say X.
2. It is bell-shaped curve.
3. It is symmetrical about its mean, 
4. The area under the curve is equal to 1.
5. It is determined by two quantities, its mean  and its
standard deviation .
Changing  merely shifts the whole curve to the left or
right. Increasing  makes the curve flatter and more
spread out.
6. The probability between the limits :
 -  and  +  is 0.68
 - 2 and  + 2 is 0.95
 - 3 and  + 3 is 0.997
In practice the probability distribution of the variables
we observe are not know, but if the distribution is bell
shaped and reasonably symmetrical about the mean, use
can be made of the Normal distribution. Note, however,
that observations made on normal (healthy ?) individuals
do not necessarily follow a Normal distribution.
Given a random variable X that can take on any
value between negative and positive infinity (-
and +), the formula for the normal distribution
is as follows:

f (X=x) = 1/ ( 2) e –0.5((x-)/) 2

Where  = 3.1416. The function depends only on


the mean  and standard deviation  because
they are the only components that vary.
Because the area under the curve is equal to 1, we can
use the curve for calculating probabilities. For example,
to find the probability that an observation falls between
a and b on the curve, we integrate the preceding equation
between a and b, where -  is given the value a and + 
is given the value b. (Integration is a mathematical technique
in calculus used to find area under a curve).

Since the values of  and  will depend on the particular


problem in hand and tables of the Normal distribution cannot
be published for all values of  and , calculations are made
by referring to the Standard Normal Distribution which
has  = 0 and  = 1.
SND = Z = (X - ) / 
The table A-2 gives the area under the curve between 0 and
+Z.
Example: If thermometers have an average (mean)
reading of 0 degrees and a standard deviation of 1
degree
for freezing water and if one thermometer is randomly
selected, find the probability that it reads freezing
water
between 0 degrees and 1.58 degrees.
Using Symmetry to Find the Area
to the Left of the Mean
Because of symmetry, these areas are equal.

NOTE: Although a z score can be negative, the area


under the curve (or the corresponding probability)
can never be negative.
Exercises :

1. If Z has a standard normal distribution, find

a. P (Z > 1.64) e. P (-1.64 < Z < -1.02)


b. P (Z < -1.64) f. P (0 < Z < 1.96)
c. P (1.0 < Z < 1.5)
d. P (-1.0 < Z < 2.0)
e. P (-2.0 < Z < 2.0)

2. Suppose that scores on an aptitude test are normally distri-


buted about a mean  = 60 with a standard deviation
 = 20. What proportion of the scores :
a. Exceed 85 ?
b. Fall below 50 ?
3. Assuming systolic blood pressure (BP) in normal healthy
individuals is normally distributed with  = 120 and
 = 10 mm Hg. Make the appropriate transformations to
answer the following questions. (Hint: Make sketches of
the distribution to be sure you are finding the correct
area)
a. What area of the curve is above 130 mmHg ?
b. What area of the curve is above 140 mmHg ?
c. What area of the curve is between 100 and 140 mmHg?
d. What area of the curve is above 150 mmHg ?
e. What area of the curve is either below 90 mmHg or
above 150 mmHg ?
f. What is the value of the systolic blood pressure that
divides the area under the curve into the lower 95 %
and the upper 5 % ?
4. Assume that among diabetics the fasting blood level of
glucose is approximately normally distributed with a
mean  = 105 mg per 100 ml and standard deviation
 = 9 mg per 100 ml.

a. What proportion of diabetics have levels between


90 and 125 mg per 100 ml ?
b. What level cuts off the lower 10 percent of diabetics ?
5. The values of serum sodium in healthy adults approxima-
tely follow a normal distribution with a mean of 141
mEq/L and standard deviation of 3 mEq/L
a. What is the probability that a normal healthy adult will
have a serum sodium value above 147 mEq/L ?
b. What is the probability that a normal healthy adult will
have a serum sodium value below 130 mEq/L ?
c. What is the probability that a normal healthy adult will
have a serum sodium value between 132 and 150
mEq/L ?
d. What serum sodium level is necessary to put someone
in the top 1 % of the distribution ?
e. What serum sodium level is necessary to put someone
in the bottom 10 % of the distribution ?