You are on page 1of 2

5.

Scatter Diagram
Overview
A scatter diagram is a visual representation of the
relationships between selected variables.
In other words, a scatter diagram helps us
determine whether two variables are correlated
with one another.
And when we say something is correlated were
basically saying they are related to one another.
In the example to the right we see on time
delivery and customer satisfaction plotted inside of a scatter diagram and there definitely seems to
be a rather strong positive correlation. In other words, as on time delivery increases so too does
customer satisfaction.

Correlation Does Not Always Mean Causation


Now then, if you remember back to our first overview module of this course we learned an
extremely important concept namely that correlation does not automatically mean theres
causation.
The example we used that in that module was how it can be show that as the number of liquor
stores increase in a neighborhood so do the number of churches built. So, one could conclude that if
a town hoped to have more churches all they need to do is build more liquor stores, right?
Of course this is complete nonsense since the two liquor stores and churches arent correlated
at all. A better explanation for this situation is that as a town grows in population therell be some
residents who like to visit liquor stores and some who like to attend church.
So the key point Id like to stress here is to not get too excited when you think youve identified
correlated variables. Instead, once weve identified possible correlation we should always follow the
11th commandment of continuous improvement which is simply thou shall confirm.
In other words, we need to set up an experiment as best we can in order to verify our assumption of
correlation is valid.

9. Scatter Diagram

Page 1

R2
While the graphical display of the scatter diagram helps us determine how much correlation exists
theres actually a far more accurate way to determine the level of correlation.
And this method has us examining an important statistic known as R Squared which measures the
proportion of variation explained by the model.
What this basically means is high R squared values tell us that the two variables being studied are
highly correlated.
The next logical question is how high is high. Generally speaking R squared values greater than .8
indicate strong correlation while R Squared values between .5 and .8 generally indicate moderate
correlation. Finally, when see R Squared values less than .5 its safe to say that little to no correlation
exists within this particular study.

Building Predictive Models


Scatter diagrams can help us build predictive models. By using statistical software such as Sigma XL
we are presented with a formula as shown below. If our model is adequate, meaning R squared is at
least 0.80 we can leverage the formula to build predictive equations as shown in the example below.

9. Scatter Diagram

Page 2

You might also like