You are on page 1of 65

Academic Skills

www.brunel.ac.uk/library/ask

Christine Pereira Academic Skills Adviser ask@brunel.ac.uk

Session Overview
Levels of Measurement
What is Linear Regression? Simple Linear Regression in SPSS

Multiple Linear Regression in SPSS

Categorical vs. Quantitative Variables

Levels of Measurement
Types of Data

Categorical
Qualitative

Scale
Quantitative

Nominal
(Unranked categories)

Ordinal
(Ranked categories)

Not Grouped

Marital Status Political Party Eye Color

Satisfaction level Level of agreement

Age in years Time Weight Height No. of cars No. of students

Determines whether or not you can use linear

regression.

Predicting the outcome of one or more variables

Two types of LINEAR regression


Simple Linear Regression
One variable is used to predict the value of another

variable

Multiple Linear Regression


Two or more variables are used to predict the value of

another variable

Level of Measurement?
Dependent variable must be scale Independent variable(s) must be scale or dichotomous dichotomous = ordinal or nominal with exactly 2 groups

Identifying which variable depends on the others

Variables from a sample


We collect information about different variables:
e.g. gender, age, level of satisfaction, annual income, etc

Independent Variables
Their value does not depend on the value of another variable Their values are used to explain the values of another variable.

Dependent Variables
Their value depends upon the value of another variable Their values can be explained (in part) by the values of another variable.

It is our job to determine how much one variable

depends on another, if at all.

Exercise 1. Identify the independent & dependent variable.


General happiness and age.
Gender and general happiness. Annual income and job satisfaction.

Annual Income and rated skill of work.

Remember to ask: Does ____ effect any change in ____?

Predicting the outcome of the dependent variable with one independent variable

Simple Linear Regression


Estimating the best line to draw through the points.
How do we know which line is best?
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500
-10 -5 0 5 10 15 20 25 30
Daily Temperature (Co)

Simple Linear Regression


The best line has the smallest total error.
Error = Predicted value Actual value
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500 -10

Actual value Error

Predicted value
-5 0 5 10 15 20 25 30
Daily Temperature (Co)

Simple Linear Regression


This line has a mathematical equation:
= 0 + 1
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500
-10 -5 0 5 10 15 20 25 30
Daily Temperature (Co)

Simple Linear Regression


This line has a mathematical equation:
The DV

= 0 + 1
No. of students attending lectures

2640 2620 2600 2580 2560 2540 2520 2500


-10 -5 0 5 10 15 20 25 30
Daily Temperature (Co)

Simple Linear Regression


This line has a mathematical equation:
= 0 + 1
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500
-10 -5 0 5 10 15 20 25 30
Daily Temperature (Co)

The IV

Simple Linear Regression


This line has a mathematical equation:
= 0 + 1
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500
-10 -5 0 5 10 15 20
Daily Temperature (Co)

Y-intercept (0) The predicted value when X=0


25 30

Simple Linear Regression


This line has a mathematical equation:
= 0 + 1
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500
-10 -5 0 5 10

The steepness of the line: + value: the line increases - value: the line decreases
15 20 25 30
Daily Temperature (Co)

Simple Linear Regression


This line has a mathematical equation: = 0 + 1
. = 0 + 1
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500
-10 -5 0 5 10 15 20
Daily Temperature (Co)

0 and 1 are estimated using regression


25 30

Simple Linear Regression


This line has a mathematical equation:
. = 2543 + 2.9
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500
-10 -5 0 5 10 15 20 25 30
Daily Temperature (Co)

Simple Linear Regression


This line has a mathematical equation:
. = 2543 + 2.9
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500
-10 -5 0 5 10 15 20 25 30
Daily Temperature (Co)

Simple Linear Regression


This line has a mathematical equation:
. = 2543 + 2.9
No. of students attending lectures
2640 2620 2600 2580 2560 2540 2520 2500
-10 -5 0 5 10

The steepness of the line: + value: the line increases


15 20 25 30
Daily Temperature (Co)

Simple Linear Regression


This line has a mathematical equation:
. = 2543 + 2.9

If the temp is 15 Co, how many students

would be predict to be on campus?


. = 2543 + 2.9 15 . 2586

A few simple steps


1. Scatter plot using both variables
The relationship should be linear (a straight line)

2. The correlation between the DV and IV should be significant (use Pearsons r)


If they are not sig. correlated, then the IV is not a good

predictor of the DV

3. Simple linear regression in SPSS 4. Determine if the model is significant


If its significant, how good is it?

5. Determine if the coefficients are significant 6. State final conclusions

Example 2. Predict Mean Satisfaction given Income


Dependent variable = Mean Satisfaction
Independent variable = Gross Annual Income

Use regression to estimate an equation which uses income to predict how satisfied employees are with their job
= 0 + 1
The DV 0 and 1 are estimated using regression
The IV

Example 2. Predict Mean Satisfaction given Income


Dependent variable = Mean Satisfaction
Independent variable = Gross Annual Income

1 Scatter plot of Mean Satisfaction and Income


Circular? Looks like no correlation Lets check step 2. Independent Dependent

Example 2. Predict Mean Satisfaction given Income


Dependent variable = Mean Satisfaction
Independent variable = Gross Annual Income

2 Pearsons correlation between both variables


p > 0.05 Non-significant correlation Cant use regression

Example 3. Predict Income given Years worked


Dependent variable = Gross Annual Income
Independent variable = Number of years worked
= 0 + 1
The DV 0 and 1 are estimated using regression

The IV

Example 3. Predict Income given Years worked


Dependent variable = Gross Annual Income
Independent variable = Number of years worked

1 Scatter plot for Income and Years worked

Independent
Dependent

Example 3. Predict Income given Years worked


Dependent variable = Gross Annual Income
Independent variable = Number of years worked

2 Pearsons correlation between both variables


p < 0.05 Significant correlation Can use regression

Linear Regression in SPSS

Example 3. Predict Income given Years worked


3 Linear Regression in SPSS

Leave as Enter unless you have a good reason to change it

Example 3. Predict Income given Years worked


4 Is the model significant? H0: The model using Years worked IS NOT a significant predictor of Income H1: The model using Years worked IS a significant predictor of Income

p < 0.05 Reject H0 in favour of H1

Example 3. Predict Income given Years worked


4 Its significant, but how good is it?

This is the correlation coefficient for Income and Years worked

Evaluating Strength. Numerically.


Correlation coefficients are between -1 and 1.
Sign of correlation coefficient + values
Positive relationship

Strong

Moderate

Weak

Very weak or None


0 to 0.09 -0.09 to 0

0.5 to 1.0 -1.0 to -0.5

0.3 to 0.49 -0.49 to -0.3

0.1 to 0.29 -0.29 to -0.1

- values
Negative relationship

.354 is moderate

Example 3. Predict Income given Years worked


4 Its significant, but how good is it?

.125 * 100% = 12.5% 12.5% of the variation in Income can be explained by the number of Years worked

That means 87.5% is still unexplained!

Example 3. Predict Income given Years worked

= 7399.84 + 33.08

0 1 have the same units as the DV Here, the DV unit is

Example 3. Predict Income given Years worked

= 7399.84 + 33.08

Are the coefficients significant? 5


H0: 0 is NOT significant, it should be 0. H1: 0 IS significant.

p < 0.05 Reject H0 in favour of H1.

Example 3. Predict Income given Years worked

= 7399.84 + 33.08

Are the coefficients significant? 5


H0: Years worked is not a sig. predictor of income. 1 should be 0. H1: Years worked is a sig. predictor of income. 1 is not 0.

Example 3. Predict Income given Years worked

= 7399.84 + 33.08

p < 0.05 Are the coefficients significant? Reject H0 in favour of H1.

H0: Years worked is not a sig. predictor of income. 1 should be 0. H1: Years worked is a sig. predictor of income. 1 is not 0.

Example 3. Predict Income given Years worked


6 Final conclusions
Using the number of years worked as a model to

predict income is statistically significant.


The model can explain 12.5% of the variation in

income. There must be other factors affecting income, not just years worked.
The Y-intercept and IV, years worked, are

significant to the model.

Example 3. Predict Income given Years worked


6 Final conclusions
The final model is:

Income = 7399.84 + 33.08 * YearsWorked


Interpreting the model:
The average starting salary is 7399.84 (no years

worked). On average, employees income increases by 33.08 for every year worked for the company.

Predicting the outcome of the dependent variable with two or more independent variables

A few simple steps


1. Scatter plot between the dependent variable and each independent variable.
Is the relationship linear?

2. Pearsons correlation between all variables


Dependent and independent variables should be

significantly correlated Independent variables should NOT be significantly correlated.


If 2 independent variables are significantly correlated, only use ONE of them in the regression model.

A few simple steps


3. Multiple linear regression in SPSS
4. Determine if the model is significant
If its significant, how good is it?

5. Determine if the coefficients are significant


6. State final conclusions

Example 4.
Predict Income given Years worked & Mean Autonomy Score

Dependent variable = Gross Annual Income


Independent variables = Number of years worked

& Mean Autonomy Score


= 0 + 1 + 2

Use regression to estimate these coefficients

Example 4.
Predict Income given Years worked & Mean Autonomy Score

Dependent variable = Gross Annual Income


Independent variables = Number of years worked

& Mean Autonomy Score

1 Scatter plot for Income and Years worked

Independent Dependent

Example 4.
Predict Income given Years worked & Mean Autonomy Score

Dependent variable = Gross Annual Income


Independent variables = Number of years worked

& Mean Autonomy Score

1 Scatter plot for Income and Mean Autonomy

Independent Dependent

Example 4.
Predict Income given Years worked & Mean Autonomy Score

Dependent variable = Gross Annual Income


Independent variables = Number of years worked

& Mean Autonomy Score

2 Pearsons correlation between all variables


Both independent variables should be significantly correlated with the dependent variable (p < 0.05)

Example 4.
Predict Income given Years worked & Mean Autonomy Score

Dependent variable = Gross Annual Income


Independent variables = Number of years worked

& Mean Autonomy Score

2 Pearsons correlation between all variables


Both dependent variables should not be significantly correlated with each other (p > 0.05)

Linear Regression in SPSS

Example 4.
Predict Income given Years worked & Mean Autonomy Score

3 Linear Regression in SPSS

Leave as Enter unless you have a good reason to change it

Example 4.
Predict Income given Years worked & Mean Autonomy Score

4 Is the model significant? H0: The model using Years worked & Mean Autonomy IS NOT
a significant predictor of Income

H1: The model using Years worked & Mean Autonomy IS a


significant predictor of Income

p < 0.05 Reject H0 in favour of H1

Example 4.
Predict Income given Years worked & Mean Autonomy Score

4 Its significant, but how good is it?

The correlation between Income values predicted by the model and the actual Income values from the sample.

Evaluating Strength. Numerically.


Correlation coefficients are between -1 and 1.
Sign of correlation coefficient + values
Positive relationship

Strong

Moderate

Weak

Very weak or None


0 to 0.09 -0.09 to 0

0.5 to 1.0 -1.0 to -0.5

0.3 to 0.49 -0.49 to -0.3

0.1 to 0.29 -0.29 to -0.1

- values
Negative relationship

.482 is moderate

Example 4.
Predict Income given Years worked & Mean Autonomy Score

4 Its significant, but how good is it?

.232 * 100% = 23.2% 23.2% of the variation in Income can be explained by the number of Years worked & Mean Autonomy

Example 4.
Predict Income given Years worked & Mean Autonomy Score

= 6294.13 + 30.45 + 482.59

0 , 1 2 have the same units as the DV Here, the DV unit is

Example 4.
Predict Income given Years worked & Mean Autonomy Score

= 6294.13 + 30.45 + 482.59

Are the coefficients significant? 5


H0: 0 is NOT significant, it should be 0. H1: 0 IS significant.

p < 0.05 Reject H0 in favour of H1.

Example 4.
Predict Income given Years worked & Mean Autonomy Score

= 6294.13 + 30.45 + 482.59

Are the coefficients significant? 5


H0: Years worked is not a sig. predictor of income. 1 should be 0. H1: Years worked is a sig. predictor of income. 1 is not 0.

Example 4.
Predict Income given Years worked & Mean Autonomy Score

= 6294.13 + 30.45 + 482.59

Are the coefficients significant? 5


H0: Years worked is not a sig. predictor of p < 0.05 income. 1 should be 0. Reject H0 in favour of H1. H1: Years worked is a sig. predictor of income. 1 is not 0.

Example 4.
Predict Income given Years worked & Mean Autonomy Score

= 6294.13 + 30.45 + 482.59

Are the coefficients significant? 5


H0: Mean autonomy score is not a sig. predictor of income. 2 should be 0. H1: Mean autonomy score is a sig. predictor of income. 2 is not 0.

Example 4.
Predict Income given Years worked & Mean Autonomy Score

= 6294.13 + 30.45 + 482.59

Are the coefficients significant? 5


H0: Mean autonomy score is not apsig. predictor < 0.05 Reject H0 in favour of H1. of income. 2 should be 0 . H1: Mean autonomy score is a sig. predictor of income. 2 is not 0.

Example 4.
Predict Income given Years worked & Mean Autonomy Score

6 Final conclusions
Using the number of years worked & mean

autonomy as a model to predict income is statistically significantly.


The Y-intercept and coefficients for both IVs

(years worked & mean autonomy) are significant to the model.

Example 4.
Predict Income given Years worked & Mean Autonomy Score

6 Final conclusions
The final model is:
= 6294.13 + 30.45 + 482.59

Interpreting the model:


The average starting salary is 6294.13 (no years worked &

no autonomy). On average, employees income increases by 30.45 for every year worked for the company. On average, employees income increases by 482.59 for every 1 point increase in mean autonomy score.

Final Comments
If you use regression, youll need to do a bit of

reading about assumptions:


Normality, linearity, homoscedasticity &

mulitcoliniarity.

Also look into these:


What to do about outliers and how they can affect

a model. Independence of residuals.

2 good books:
Discovering statistics using SPSS, Andy Field SPSS survival guide, Julie Pallant

@SK Academic Skills


Library

Contact Christine Pereira


Academic Skills Adviser

Library
Brunel University, Uxbridge Middlesex, UB8 3PH, UK E-mail christine.pereira2@brunel.ac.uk Web www.brunel.ac.uk/library/ask

You might also like