You are on page 1of 21

Correlation Scatterplots Regression

Generating results in SPSS Reading SPSS output

Correlation, Scatterplots and Regression


Correlation measures the strength and the direction of relationship Scatterplots present visual image of data Regression produces a best-fit line to predict dependent variable from independent variable Significance of relationship tested with correlation or regression

Correlation: Linear Relationships


Strong relationship = good linear fit
180 160 140

160 140 120

Symptom Index

120 100 80 60 40 20 0 0 50 100 150 200 250

S ymptom Index

100 80 60 40 20 0 0 50 100 150 200 250

Drug A (dose in mg)

Drug B (dose in mg)

Very good fit

Moderate fit

Points clustered closely around a line show a strong correlation. The line is a good predictor (good fit) with the data. The more spread out the points, the weaker the correlation, and the less good the fit. The line is a REGRESSSION line (Y = bX + a)

Interpreting Correlation Coefficient r

strong correlation: r > .70 or r < .70 moderate correlation: r is between .30 & .70 or r is between .30 and .70 weak correlation: r is between 0 and .30 or r is between 0 and .30 .

Running Correlation in SPSS Strength Direction - Significance


Click Analyze Correlate Bivariate Move the two variables into the box Click OK

Cor relations age A GE OF RESPONDE NT 1 rincome RESPONDEN TS INCOME .173** .000 2803 1798 .173** 1 .000 1798 1801

age A GE OF RESPONDENT rincome RESPONDENTS INCOME

Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N

**. Correlation is signif icant at the 0.01 lev el (2-tailed).

SPSS Correlation Output


Cor relations

Value of Correlation Coefficient on first line r = +.173


Relationship is positive Relationship is weak

age A GE OF RESPONDENT rincome RESPONDENTS INCOME

Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N

age A GE OF RESPONDE NT 1

rincome RESPONDEN TS INCOME .173** .000 2803 1798 .173** 1 .000 1798 1801

**. Correlation is signif icant at the 0.01 lev el (2-tailed).

p-value (Significance) is on the second line p < .001 (whenever SPSS shows .000)
Relationship is significant Reject H0

Correlation for Your Project


Your dependent variable is Interval/Ratio Look at the data set and select one other interval/ratio variable that might be related to (predictive of) your dependent variable Following the instructions above
run correlation of that variable. run scatterplot of the variable

GENERATE A SCATTERPLOT TO SEE THE RELATIONSHIPS


Go to Graphs Legacy dialogues Scatter Simple

Click on DEPENDENT V. and move it to the Y-Axis


Click on the OTHER V. and move it to the X-Axis Click OK

Scatterplot might not look promising at first Double click on chart to open a CHART EDIT window

use Options Bin Element

Simply CLOSE this box. Bins are applied automatically.

BINS
Dot size now shows number of cases with each pair of X, Y values

DO NOT CLOSE CHART EDITOR YET!

Add Fit Line (Regression)


In Chart Editor: Elements Fit Line at Total Close dialog box that opens Close Chart Editor window

Edited Scatterplot
Distribution of cases shown by dots (bins) Trend shown by fit line.

Regression
Regression predicts the Dependent Variable based on the Independent Variable
Computes best-fit line for prediction Output includes slope and intercept for line

Hypothesis Test based on ANOVA


SStotal computed SStotal divided into Regression (predicted) and Error (random)

Effect size = R2 for regression

SPSS for Regression


Analyze Regression Linear

Simple Linear Regression (One independent variable)


Move Dependent Variable into box marked Dependent Move Independent Variable into box marked Independent Click OK

Regression Output
Model Sum m ary Model 1 R .173 a R Square .030 Adjusted R Square .029 Std. Error of the Estimate 2.838 a. Predictors: (Constant), age AGE OF RESPONDENT

Each element of output considered separately in the following slides.


Mean Square 445.824 8.053 F 55.359 Sig. .000 a

ANOVAb Model 1 Sum of Squares 445.824 14463.845 14909.669 df 1 1796 1797

Regression Residual Total

a. Predictors: (Constant), age AGE OF RESPONDENT b. Dependent Variable: rincome RESPONDENTS INCOME

a Coe fficients

Model 1

(Cons tant) age AGE OF RESPONDENT

Unstandardiz ed Coef f icients B Std. Error 8.864 .224 .037 .005

Standardized Coef f icients Beta .173

t 39.598 7.440

Sig. .000 .000

a. Dependent Variable: rincome RESPONDENTS INCOME

ANOVA Table
ANOVAb Model 1 Sum of Squares 445.824 14463.845 14909.669 df 1 1796 1797 Mean Square 445.824 8.053 F 55.359 Sig. .000 a Regression Residual Total

a. Predictors: (Constant), age AGE OF RESPONDENT b. Dependent Variable: rincome RESPONDENTS INCOME

Regression SS refers to variability related to the Independent Variable the treatment Residual SS refers to variability not related to the Independent Variable the error or chance element. For regression, df for treatment is 1 per variable Compute MS and F in the same way as ANOVA If p-value (Sig) < the Regression line fits the data better than a flat line; the relationship is significant.

The Regression Line Equation


Y = bX + a b is the coefficient for the Independent Variable a is the constant coefficient (intercept) Predict values of Y based on values of X
a Coe fficients

Model 1

(Cons tant) age AGE OF RESPONDENT

Unstandardiz ed Coef f icients B Std. Error 8.864 .224 .037 .005

Standardized Coef f icients Beta .173

t 39.598 7.440

Sig. .000 .000

a. Dependent Variable: rincome RESPONDENTS INCOME

Effect Size: R2
In regression, the effect size is similar to 2 in ANOVA SSregression /SStotal Represented by R2 (capital R) For simple regression (one variable) use the R-Square figure.

Model Sum m ary Model 1 R .173 a R Square .030 Adjusted R Square .029 Std. Error of the Estimate 2.838

a. Predictors: (Constant), age AGE OF RESPONDENT

Sample Write-Up
Data from the 2004 General Social Survey were used to explore the relationship between age and income, as most Americans expect to earn more money after years in the workforce. Respondents age showed a weak positive correlation (r = .173, p < .001) with income level. Linear regression demonstrated a significant positive relationship (F(1,1796) = 55.359, p < .001). Income increased approximately one-third of an income level for each increased decade of age (b = .037). Due to the large range of income levels at every age (see Figure 1), age only accounts for 3% of the variability of income levels. Older people do tend to earn higher incomes, but other characteristics are probably a better predictor of income than age.

You might also like