You are on page 1of 30

Simple Linear Regression and Correlation (Continue..

,)

Reference: Chapter 17 of Statistics for Management and Economics, 7th Edition, Gerald Keller.

Using the regression equation


(17.6)

Why regression?
1. Analyze specific relations between Y and X. How is Y related to X? 2. Forecast / Predict the variable Y with the help of X. In this case, linear relationship
3

Two kinds of prediction


Point predictions

y = b0 + b1 x

Do not provide any information about how closely the value will match the true value

Interval predictions
Prediction intervals for predicting y for a given value of x Confidence intervals for the average of y for a given x.
4

Example; fast food company


Make a prediction for one restaurants selling if it has an advertisment budget of $750 000.

200 190 180 170 160 150 140 130 120 110 100

SALES

90 80 0 200 400 600 800 1000 1200 1400

ADVER
6

The prediction interval


y t / 2:n 1 s (xg x) 1 1+ + 2 n (n 1) ( xi x )
2

Example; fast food company


Make a prediction interval with 95% confidence for one restaurants selling if it has an advertisment budget of $750 000.

Confidence interval for the expected value

E[Y | X = x g ]

y t 2:n 2 s

(xg x) 1 + 2 n (n 1)( x1 x )
2
9

Example; fast food company


Make a interval with 95% conficence for the mean selling for restaurants having an advertisment budget of $750 000.

10

200 190 180 170 160 150 140 130 120 110 100

SALES

90 80 0 200 400 600 800 1000 1200 1400

ADVER
11

17.7 Regression Diagnostics - I


The three conditions required for the validity of the regression analysis are:
the error variable is normally distributed. the error variance is constant for all values of x. The errors are independent of each other.

How can we diagnose violations of these conditions?


12

Residual Analysis
Most of the departure from the required conditions can be diagnosed by the residual analysis
Standardiz ed residual = Residual mean of the residual standard deviation of the residual
th

For our case

ei Standardiz ed i residual = s

Food company... a 1st data: when ADVER= 276, SALES= 115.0 predicted SALES= 118.0087 residual= 115.0 118.0087= 3.008726
Standardiz ed residual = 3.008726 = 0.1869935 16.09
13

Nonnormality
Nonnormality of the residuals can be checked by making a histogram on residuals
Food Company Example
3.0 Frequency 0.0 -30 0.5 1.0 1.5 2.0 2.5

-20

-10

0 Residual

10

20

30

14

Heteroscedasticity
Variance of the errors is not constant (Violation of the requirement)

Homoscedasticity
Variance of the errors is constant (No violation of the requirement) Check: plot the residuals against predicted values of Y by the model
20 Residual -20 -10 0 10

Nonindependnece of error variable

120

130

140

150

160

170

180

predicted SALES values

15

Outliers
An outlier is an observation that is unusually small or large. Several possibilities need to be investigated when an outlier is observed:
There was an error in measuring or recording the value. The point does not belong in the sample. The observation is valid.

16

Identify outliers from the scatter diagram. It is customary to suspect an observation is an outlier if its |standard residual| > 2

17

Influential observations
200 200 180 180 11

160

160

140

140

120

120

SALES

100 200 400 600 800 1000 1200

SALES

100 200 400 600 800 1000 1200

ADVER

ADVER

18

Testing the coefficient of correlation


The coefficient of correlation is used to measure the strength of association between two variables. The coefficient values range between -1 and 1.

If r = -1 (negative association) or r = +1 (positive association) every point falls on the regression line. If r = 0 there is no linear pattern.
The coefficient can be used to test for linear relationship between two variables.
19

To test the coefficient of correlation for linear relationship between X and Y


X and Y must be observational X and Y are bivariate normally distributed

Y X
20

When no linear relationship exist between the two variables, = 0. The hypotheses are:

n2 t=r 1 r 2 The test statistic is:

H0: = 0 H1: 0

21

The statistic is Student t distributed with d.f. = n - 2, provided the variables are bivariate normally distributed.

22

Food Company Example...


Sample correlation coefficient between ADVER and SALES

r = 0.826
Test statistic

92 t = 0.826 = 3.877074 2 1 0.826

23

Testing the Coefficient of correlation


Foreign Index Funds (Index)
A certain investor prefers the investment in an index mutual funds constructed by buying a wide assortment of stocks. The investor decides to avoid the investment in a Japanese index fund if it is strongly correlated with an American index fund that he owns. From the data shown in Index should he avoid the investment in the Japanese index fund?
24

,2

,1

0,0

-,1

USINDEX

-,2 -,2 -,1 0,0 ,1 ,2

JAPINDEX

25

Testing the Coefficient of Correlation


Solution
Problem objective: Analyze relationship between two interval variables. The two variables are observational (the return for each fund was not controlled). We are interested in whether there is a linear relationship between the two variables, thus, we need to test the coefficient of correlation

26

The sample coefficient of correlation: r = cov(x,y)/sxsy=.491 (Cov(x,y) = .001279; sx = .0509; sy = 0512)

n2 The value of the t statistic is t = r 1 r 2 = 4.26


The rejection region: |t| > t/2,n-2 = t.025,59-2 2.000.
27

Conclusion: There is sufficient evidence at a = 5% to infer that there are linear relationship between the two variables.

28

Correlations USINDEX USINDEX JAPINDEX Pearso n Co rrelatio n 1 ,491** Sig. (2-tailed) , ,000 N 59 59 Pearso n Co rrelatio n ,491** 1 Sig. (2-tailed) ,000 , N 59 59

JAPINDEX

**. Co rrelatio n is significant at the 0.01 level (2-tailed).

29

Procedure for Regression Diagnostics


Develop a model that has a theoretical basis. Gather data for the two variables in the model. Draw the scatter diagram to determine whether a linear model appears to be appropriate. Determine the regression equation. Check the required conditions for the errors. Check the existence of outliers and influential observations Assess the model fit. If the model fits the data, use the regression equation.

You might also like