You are on page 1of 5

Six Sigma Green Belt Training Program Statistics - Version 1.

0

Normal Distribution and Probability Distribution Function


Normal Distribution is a continuous probability distribution completely specified by two
parameters, mean (!" and standard deviation ($".

Probability Distribution Function (PDF)
A continuous variable may assume any value within a given interval. So the probability of getting
a value of the variable at a particular point is 0. We are interested in the probability of occurrence
of the variable within a particular interval.

Hence we now define a function called Probability Distribution Function (PDF) in such a way that
the area under the distribution curve between specific intervals gives the probability of occurrence
of the variable within that interval. The total area under the curve (for the range of the variable) is
1.
Let x be a continuous variable that assumes any value within the interval (a, b). (b>a), and let p(x)
be the PDF.
To find the probability (P) of occurrence of x within the interval c and d we need to calculate using
the integration function as below:
P(c <x < d) =
c
%
d

p(x)dx where (x(initial)=c, x(final)=d)
And
a
%
b
p(x)dx= 1. (x (initial)=a, x(final)=b) Shown as the total region under the curve. (Since total
probability is 1).

A probability distribution curve is a representation of variable x versus its PDF. This will be
classified as a Normal distribution if it adheres to the following characteristics:

Mean
X
PDF











1. It is symmetric about the mean. Thus, Median = Mean for Normal Distribution.
2. The PDF is maximum at mean. So, for any given interval, the probability is maximum about
mean. Hence Mode = Mean for Normal Distribution i.e. Mean = Median = Mode.


Tata Consultancy Services

Six Sigma Green Belt Training Program Statistics - Version 1.0

Importance of Normal Distribution

& Normal Distribution has many important algebraic properties for which it is used in statistical
theories. In most cases of physical, biological and psychological measurements, data are
found to follow Normal Distribution.
& Statistical Quality Control methods and the Theory of Errors of observations in physical
measurements are also based on normal distribution.
& Normal Distribution is also used to serve as an approximation of other distributions.
& In the sampling theory, it is found that statistics (a statistical measure on sample
observations) based on a large sample follow normal distribution. The result considerably
simplifies the task of testing statistical hypotheses.


Central Limit Theorem

Central Limit Theorem

The Central Limit theorem gives us another idea on the importance of Normal Distribution. It
states that regardless of the shape of a population, distributions of sample means and proportions
are normal for large sample sizes.

If n samples are drawn randomly from a population that has mean , and a standard deviation $,
the sample means x, are approximately normally distributed for sufficiently large sample sizes (n
>=30) regardless of the shape of the population distribution. If the population is normally
distributed, the sample means are normally distributed for any size of a sample.

Area Under the Normal Curve

The area under the Normal Curve is shown below

68.26%
95.46%
99.73%
34.13% 34.13%
13.60% 13.60%
2 14% 2 14%
0.13%
-3s -2s -1s 0 +1s +2s +3s
















Tata Consultancy Services

Six Sigma Green Belt Training Program Statistics - Version 1.0

Note:
1. Approximately 68% of all the values in a normally distributed population lie within +/- 1
Standard Deviation from the mean.
2. Approximately 95.5% of all the values in a normally distributed population lie within +/- 2
Standard Deviation from the mean.
3. Approximately 99.7% of all the values in a normally distributed population lie within +/- 3
Standard Deviation from the mean.

This result is especially useful in the theory of Statistical Quality Control. It implies that if a
random variable X is normally distributed with mean m and standard deviation s, then 99.73% of
the values lie within the limit (m+3s, m-3s). These limits are called the lower and upper control
limits.


Standard Normal Distribution

Standard tables are available to calculate the probability (area under normal curve) for a given
interval. But as it is not possible to have tables for all values for Mean and SD, we use the table of
Standard Normal Distribution.
It has Mean = 0 and SD = 1.

To calculate the probability using standard tables, we have to transform the Normal Variable to a
Standard Normal Variable. A Standard Normal Variable is the transformed variable
Z = (x mean)/SD

We can now refer to the tables to find out the probability value corresponding to our calculated Z
value.







Z value is the number of Standard Deviations which will fit between Mean and a
given value of the variable x.



How to evaluate Z value and corresponding probability

Now, for a given distribution lets find the probability of an event occurring under a specification
limit of value x.

Example

In our example #1 the mean resolution time for a CR is 48.62 person days and SD is 7.06 person
days. Assuming the data follows a normal distribution, lets try to find the probability that a CR will
be resolved in less than 40 days.

Z = (40 48.62)/7.06 = -1.22.

In the Standard Normal Distribution table for probability value corresponding to 1.22 that is =
0.1112.

Tata Consultancy Services

Six Sigma Green Belt Training Program Statistics - Version 1.0

COMPUTER HANDS-ON
Excel Workout:
1. Use the function NORMDIST.
2. Specify the specification limit of x as 40, the Mean as 48.62, and
SD as 7.06. Specify the fourth Boolean argument as True.
3. Evaluate: = NORMDIST(40,48.62,7.06,TRUE) = 0.11105

Process Capability and Sigma Value

Z is a very important metric to assess process capability. Based on the specification limits for our
data, the corresponding Z values for them are:

Z(USL) = (USL MEAN)/SD
and Z(LSL) = (LSL MEAN)/SD

Now, from the Standard Normal Distribution table we find the area under the normal distribution
outside the specification limits. This also gives us the region of defect .

Yield = 1- {(Area under curve and left of Z(LSL)) + (Area under curve and right of Z(USL))}

Obtain process capability from sigma table corresponding to this yield.


Normality Plot

We can make a guess if a distribution is normal by looking at the percentage cumulative
probability plot on y-axis (not scaled linearly), against the data range on x-axis. For a more
accurate analysis we need to perform a normality test.

For a normal distribution, our measured points will lie more or less on a diagonal straight line as
shown, (which is the plot for perfect normal distribution).

Points outside the zone indicated by the two other lines drawn along with the straight line, are
deviated from Normal Distribution. These lines are known as 95% Confidence Limits. That
means that we allow 5% deviation in location from the straight line we talked about.


Tata Consultancy Services

Six Sigma Green Belt Training Program Statistics - Version 1.0

COMPUTER HANDS-ON
Minitab workout for Example #1
1. From the Graph Menu, select Probability Plot
2. Select the column you will be using to plot.
3. Click Options and ensure the confidence limit is 95%. Click OK in the parent and child
dialog boxes.


Normality Test

There are hypothesis tests to decide whether the given data is normal or not. You can use the
Anderson Darling Test in Minitab and check for the p Value. If it is more than 0.05, you can
accept the distribution as a normal, else consider the data to be non-normal.

COMPUTER HANDS-ON
(Microsoft Excel/ Minitab)
In Minitab, the summary graph, (see Histogram) gives a comparable Normal Curve over the
histogram plot and also the Anderson Darling Test Results.

Since p > 0.05, we accept the distribution as a normal one.
Tata Consultancy Services

You might also like