You are on page 1of 6

Normal Probability Plot

- 30001 Statistics -
Department of Decision Sciences
Bocconi University

1 Introduction

In this note we introduce a graph very popular in data analysis: the Normal Probability Plot (NPP) 1 .
The NPP allows to verify the supposed normality of a given set of observations. In particular, we
can use the NPP to determine if data we are studying can be thought as (at least approximately)
distributed as a normal. This graphical analysis is important when using statistical analyses that
assume a normal distribution for the data. It is always a good idea to use the NPP before these
analyses are performed.

2 How to obtain the NPP

The necessary steps to build the NPP can be summarized as follows:

1. order the observations (x1, , xn); we indicates ordered data as (x(1), , x(n)),
2. calculate the cumulative frequencies associated with each x(i) using this formula
i 1 2
pi = , with i =1,2, , n 2 ,
n
3. calculate the quantiles z(i) of order pi relative to a standard normal random variable, that is
we find the values z(i) such that ( z( i ) ) = pi , where ( z ) = Pr( Z z ) indicates the

cumulative distribution function of a standard normal variable Z 3 ,


4. put on the horizontal axis of the graph the values z(i) and on the vertical axis the values x(i), i
= 1, 2, , n.

1
This graph is referred to also as Normal Quantile Plot and Q-Q plot.
2
Different soft wares calculate the cumulative relative frequencies, pi, in slightly different ways. For example, PHSTAT
uses this formula: pi = i (n + 1) . However, if n is big enough (at least 50), discrepancies among the alternative formulas
are negligible and the NPP does not change significantly.
3
In Excel we can use the function =INV.NORM.ST(pi) in order to calculate the z(i).
3 Inter pretation

After the graph has been created we need to verify if the dots, obtained as explained at point 4 in
section 2, form (at least approximately) a line. If this is the case, we can conclude that data have (at
least approximately) a normal distribution.
The motivating idea behind the check for linearity is simple: if data xi were distributed
normally, with mean and standard deviation , then, for the well-known standardisation rule, we
could write xi = + zi, which means that data are a linear function of zi 4 .
Let consider the following remarkable cases:
a) if data come from a normal distribution, the dots in the NPP will follow a line, as illustrated
in Figure 1a,
b) if data come from a symmetric but non-normal distribution we have to distinguish the
following situations:
b.1) if the data distribution has tails heavier than those of a normal distribution, then the
NPP will look as in Figure 1.b1: in this case, despite the fact that the central part of
the distribution is well approximated by the normal distribution, both tails are
heavier,
b.2) if data distribution were uniform, then the NPP will look as in Figure 1.b2,
b.3) if data are bimodal, then the NPP will look as in Figure 1.b3
c) if the data distribution is right-skewed, then the NPP will look as in Figure 1.c; in this case,
in fact, the first 25% of data (that is, data below the first quartile, which is highlighted in red
in the graph) is more concentrated than if they came from a normal distribution, while the
final 25% of data (that is, data above the third quartile, which is highlighted in red in the
graph) is more spread out than in the normal case,
d) if the data distribution is left-skewed, then the NPP will look as in Figure 1.d; in this case, in
fact, the first 25% of data (that is, data below the first quartile, which is highlighted in red in
the graph) is more spread out than if they came from a normal distribution, while the final
25% of data (that is, data above the third quartile, which is highlighted in red in the graph) is
more concentrated then in the normal case.

4
We remember that if X ~ N(, ), then Z = (X- )/ ~ N(0, 1), and, on the other hand, if Z ~ N(0, 1), then X = +Z ~
N(, ).
Figure 1 The NPP structure for some remarkable cases.

1.a) Normal distribution 1.b1) Symmetric but non- normal distribution


(t Student)

1.b2) Symmetric but non-normal distribution 1.b3) Symmetric but non-normal distribution
(uniform) (bi-modal)

1.c) Right-skewed distribution 1.d) Left-skewed distribution


4 Examples

4.1 Distribution of daily returns for a sample of investment funds.

Figure 2 shows the NPP for daily returns for a sample of 190 investment funds. Even though the left
tail of the distribution is slightly different from that of a normal (due to the presence of few funds
with highly negative performances), we can reasonably say that the distribution of daily returns is
approximately normal, because dots in the NPP are quite well located on a line.

4.2 Distribution of returns for a quoted stock.

Figure 3 shows the NPP for a sample of 471 daily returns of a stock quoted on the Milan Stock
Exchange. From the graph we can see that the distribution of returns is quite symmetric but non-
normal. In fact, dots are not located on a line. In particular, this example corresponds to the case b1
outlined in the previous section. It is reasonable to say that, in this case, the stock returns come from
a symmetric distribution with tails heavier that those of a normal. The normality hypothesis, in this
case, is not adequate.

4.3 Distribution of CEO remunerations.

Figures 4 and 5 show, respectively, the NPP of salaries and benefits for a sample of 667 CEOs of
big USA firms. Both distributions are right-skewed, but the asymmetry is lower for salaries and
higher for benefits. In both cases it is not reasonable to say that data are (even approximately)
normally distributed.
Figure 2 NPP for daily returns of a sample of investment funds.

Figure 3 NPP for the returns of a quoted stock.


Figure 4 NPP for the salary of a sample of CEO.

Figure 5 NPP for the benefits of a sample of CEO.

Appendix: PHSTAT

With PHSTAT we can produce a NNP with the following steps:

PHStat -> Probability & Prob. Distributions ->Normal Probability Plot.


We remember that regardless of the sample size, n, PHSTAT always calculates the cumulative
frequencies, pi, as pi = i (n + 1) .

You might also like