You are on page 1of 2

6 NON-PARAMETRIC TESTS

6.05 Spearman correlation

Non-parametric procedures are not only available for hypothesis tests, there are also non-
parametric ways to measure for instance the association between variables. The most important of
these is the Spearman rank correlation coefficient – it is often treated as the non-parametric
counterpart of the Pearson correlation coefficient. In this video I will explain when to apply, how to
calculate and how to interpret the Spearman correlation coefficient.

A correlation coefficient is a standardized measure to express the degree by which two variables are
associated. The standardization implies that it has a fixed range over which it varies and does not
change when you change for example the units of the variables by adding a constant or
multiplication. Most correlation coefficients vary between minus one and plus one. The Pearson
correlation coefficient, also called the product moment correlation coefficient, is the coefficient that
used most frequently. It measures a linear association between two numerical variables. If you
would like to test the Pearson correlation coefficient, for example to evaluate whether it is different
from zero, you have to additionally assume that the two variables are bivariate normally distributed,
which implies that a scatter plot of the data shows an approximate ellipsoidal shape and that each of
the variables separately follows a normal distribution. In general, the Pearson correlation coefficient
is sensitive to outliers, and skewedness of the distribution in one or both variables. The Spearman
correlation coefficient is a good replacement of the Pearson correlation if one of these conditions
applies to your variables:
- they are not numerical but one or both of the variables are ordinal
- they are not linearly related
- they contain one or more outliers or
- they don’t follow a bivariate normal distribution or you cannot check this due to lack of data.

To understand Spearman’s correlation it is necessary to know what a monotonic function is. A


monotonic function is one that either never increases or never decreases as its independent variable
increases. The following graphs illustrate monotonic functions. At the left you see a monotonically
decreasing function, in the middle a monotonically increasing function and at the right a function
that is not monotonic. Spearman’s correlation measures the strength of a monotonic relationship
between paired data. This implies that the Spearman correlation coefficient for this data would be
+1 …, but also for this data …., or even this data… And the same applies of course to negatively
related data. You can apply Spearman’s correlation coefficient to both ordinal and numerical
variables, and to interpret it you assume that these variables are monotonically related. But to test
it, for example to see whether its value is significantly different from zero, there is no requirement
on the distribution of the data like bi-variate normality in the case of Pearson’s correlation. And
therefore it’s a nonparametric statistic. Spearman’s correlation is calculated by first ranking the
variables, whereby average ranks are assigned in the case of ties, and subsequently calculating
Pearson’s correlation on the ranked values of this data. Because it works on ranked data,
Spearman’s correlation coefficient is also called the rank correlation coefficient.

Let’s look at an example. Here the number of ingredients and the price at which a cake is being sold
is shown. This is Pearson’s correlation coefficient, expressing the strength of the linear relation
between these variables. To calculate Spearman’s correlation coefficient we first determine the
ranks for each variable, using an average of the ranks for tied values. And next, we apply the same
correlation-equation to the ranked data to obtain Spearman’s correlation, expressing the strength of
the monotonic relation between the two variables.
As you see Spearman’s correlation coefficient is higher. Let’s have a look at the shape of the
relationship … As you see the relation is non-linear. Up to around ten ingredients, the price
increases strongly, but beyond this point the number of ingredients does not influence price much
anymore. In this case it would be better to use Spearman’s correlation coefficient to express the
strength of a relationship. These are other cases where Spearman’s correlation coefficient would be
preferred over Pearson’s coefficient... In the case where you’d have ordinal data or one or more
outliers. The interpretation of Spearman’s correlation coefficient is furthermore very similar to
Pearson’s coefficient, whereby visualising the relationship between the variables is crucial. A
correlation coefficient of zero does for instance not imply that there is no relation, consider this case
where there is a clear parabolic relation between two variables. Furthermore, a correlation
coefficient is influenced by both the effect size and the spread around the relation that is being
evaluated. So even though these correlation coefficients are the same – the data tell a different
story. In the graphs at the right there is more scatter, but at the same time a larger effect size, that is
a larger change in Y with a change in X.

Like any sampling statistic, correlation coefficients can be tested and also confidence intervals ban
be calculated. The most frequently applied test is to evaluate if a value for a correlation coefficient is
significantly different from zero. The null hypothesis for this test states that the correlation
coefficient as a population parameter is zero. And the alternative hypothesis can be two-sided, that
it is higher or lower than zero, or one-sided – that it is different from zero. There are several
equations to describe the sampling distribution of the Spearman correlation, as well as statistical
tables which give the cumulative probabilities associated with a value of the correlation coefficient
and the number of data pairs used for its calculation. We will not go further into the details of
testing or the calculation of confidence intervals for the correlation coefficient here. The procedure
is equivalent to that of a one sample-proportion.

I hope you understood the following from this video


 The Spearman correlation coefficient measures the degree of association between two
variables with ordinal or quantitative measurement levels. It’s value ranges from minus 1 for
a negative association to plus 1 for a positive association.
 It is calculated by first ranking each of the variables and then determining the Pearson
correlation coefficient for these ranks.
 In comparison to the Pearson correlation coefficient, the Spearman coefficient is less
sensitive to outliers, but at the downside also about 10 percent less powerful.
 Even though the interpretation of the two correlation coefficients is roughly the same -
measuring a degree of association. They are quite different when you consider the details:
the Pearson correlation measures the degree of linear association, whereas the Spearman
correlation coefficient measures the degree of monotuous association which can be both
linear and non-linear.

You might also like