You are on page 1of 33

1

Relationships Among
Variables
Correlation and Regression
KNES 510
Research Methods in
Kinesiology
2
Correlation
Correlation is a statistical technique used to
determine the relationship between two or more
variables
We use two different techniques to determine
score relationships:
1. graphing technique
2. mathematical technique called correlation
Graphs of the Relationship
Between Variables
3
4
Types of Relationships
The scattergram can indicate a positive
relationship, a negative relationship, or a
zero relationship
What are the characteristics of positive,
negative, and zero relationships?
5
Mathematical Technique: The
Correlation Coefficient (r)
The correlation coefficient, r,* represents
the relationship between the z-scores of
the subjects on two different variables
(usually designated X and Y)
This can be stated mathematically as the
mean of the z-score products for all
subjects
*A more complete name for this statistic is Pearsons product-moment
correlation coefficient
6
Formula for the Correlation
Coefficient
The correlation coefficient can be calculated as
follows:


N
Z Z
r
Y X

7
The values of the coefficient will always
range from +1.00 to -1.00
A correlation coefficient near 0.00
indicates no relationship

8
SPSS Bivariate Correlation Output
Correlations
1 .947
.053
4 4
.947 1
.053
4 4
Pearson Correl ati on
Si g. (2-tai l ed)
N
Pearson Correl ati on
Si g. (2-tai l ed)
N
X
Y
X Y
9
Interpreting the Correlation
Coefficient
Because the relationship between two
sets of data is seldom perfect, the majority
of correlation coefficients are fractions
(0.92, -0.80, and the like)
When interpreting correlation coefficients
it is sometimes difficult to determine what
is high, low, and average
10
The Correlation Coefficient and
Cause-and-Effect
There is a high correlation between a
person's shoe size and their math skills in
grades K through 6
Is this an example of cause-and-effect?
Can we predict math skill based on shoe
size in grade K through 6 students?
11
Coefficient of Determination
The coefficient of determination is the
amount of variability in one measure that
is explained by the other measure
The coefficient of determination is the
square of the correlation coefficient (r
2
).
For example, if the correlation coefficient
between two variables is r = 0.90, the
coefficient of determination is (0.90)
2
=
0.81
12
Regression
When two variables are related
(correlated), it is possible to predict a
persons score on one variable (Y) by
knowing their score on the second variable
(X)
13
14
This scatterplot illustrates that there is a
strong, positive relationship between fat-
free body mass and daily energy
expenditure

Correlations
1 .981**
.000
7 7
.981** 1
.000
7 7
Pearson Correl ati on
Si g. (2-tai l ed)
N
Pearson Correl ati on
Si g. (2-tai l ed)
N
Fat-Free Mass (kg)
Energy Expendi ture (kcal )
Fat-Free
Mass (kg)
Energy
Expendi ture
(kcal )
Correl ati on i s si gni fi cant at the 0.01 l evel (2-tail ed).
**.
15
Regression Line (Line of Best Fit)
The regression line is a line that best
describes the trend in the data
This line is as close as possible to the data
points
The equation for this line is:

Y' = bX = C

Fitting a Regression Line
16
17
Simple Prediction
Tests have been developed to predict VO
2

max from the time it takes a person to run
1.5 miles
A person's VO
2
max can thus be predicted
from their 1.5 mile run time because a
prediction or regression equation has been
developed

18
The simple linear prediction or regression
equation takes the following form:
Y' = a + bX
Y' = predicted value
a = intercept of the regression line (Y intercept)
b = slope of the regression line (change in Y
with each change in X)
X = score on the predictor variable
19
Determining Error in Prediction
Unless two variables are perfectly related
(-1.00 or +1.00) there will always be error
associated with a prediction equation
We find the standard deviation of this
error, the standard error of prediction
(s
yx
), using the following formula:


2
1 r s s
y x y

20
Prediction and Residuals
21
A predicted score (Y) s
yx
yields a range
of scores within which a persons true
score on the predicted variable lies
If the standard error of prediction may be
interpreted as the standard deviation of
residuals, what are the odds that a
persons true score lies between Y s
yx
?
22
The standard error of prediction for
percent body fat estimated using the
skinfold method is 3.5%
If a person has their percent body fat
estimated at 12%, between what two
values does their true body fat lie (95%
probability)?
23
Which of the following will more
precisely predict job performance?
A: r = 0.168 B: r = 0.686
24
Sample SPSS Output
Here is the SPSS output for regressing
Work Simulation Job Performance
(Dependent Variable) against Supervisor
Ratings (Independent Variable)
Coefficients
a
-1.156 .675 -1.712 .089
.033 .016 .168 2.053 .042
(Constant)
Supervi sor Rati ngs
Model
1
B Std. Error
Unstandardi zed
Coeffi ci ents
Beta
Standardi zed
Coeffi ci ents
t Si g.
Dependent Vari abl e: Work Si mul ation Job Performance
a.
25
This information can be used to create a
prediction (regression) equation for
predicting work performance of future
applicants from supervisor ratings

Y = 1.156 + 0.033 X

26
Work Simulation Job Performance may
also be predicted from Arm Strength
Here is the SPSS output:

Coefficients
a
-4.095 .392 -10.454 .000
.055 .005 .686 11.353 .000
(Constant)
Arm Strength (l bs)
Model
1
B Std. Error
Unstandardi zed
Coeffi ci ents
Beta
Standardi zed
Coeffi ci ents
t Si g.
Dependent Vari abl e: Work Si mul ati on Job Performance
a.
27
This information can be used to create a
prediction (regression) equation for
predicting work performance of future
applicants from supervisor ratings

Y = 4.095 + 0.055 X

28
We now have two regression equations for
predicting Work Simulation Job
Performance
Which is the better equation for accurate
prediction?
To determine this, we must examine the
standard error of prediction for each
equation
29
Standard error of prediction using Supervisor Ratings:



Standard error of prediction using Arm Strength:





Which is the better equation?

Model Summary
.168
a
.028 .022 1.66078
Model
1
R R Square
Adj usted
R Square
Std. Error of
the Esti mate
Predi ctors: (Constant), Supervi sor Rati ngs
a.
Model Summary
.686
a
.471 .467 1.22582
Model
1
R R Square
Adj usted
R Square
Std. Error of
the Esti mate
Predi ctors: (Constant), Arm Strength (l bs)
a.
30
Multiple Prediction
A prediction formula using a single
measure X is usually not very accurate for
predicting a person's score on measure Y
Multiple correlation-regression
techniques allow us to predict score Y
using several X scores
31
The general form of a two predictor
multiple regression equation is:
Y' = a + b
1
X
1
+ b
2
X
2

32
An example of multiple correlation-
regression is the prediction of percent
body fat from multiple skinfold
measurements
DB (g/cc) = 1.0994921 - 0.0009929 (3SKF)
+ 0.0000023 (3SKF)
2
0.0001392 (age)
Next Class
Chapters 9 & 11
Mock Proposals in class!
33

You might also like