You are on page 1of 30

Copyright 1996

Lawrence C. Marsh

Chapter 3

3.1

The Simple Linear Regression Model


Copyright 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein.

Copyright 1996

Lawrence C. Marsh

3.2

Purpose of Regression Analysis


1. Estimate a relationship among economic variables, such as y = f(x). 2. Forecast or predict the value of one variable, y, based on the value of another variable, x.

Copyright 1996

Lawrence C. Marsh

3.3

Weekly Food Expenditures


y = dollars spent each week on food items. x = consumers weekly income. The relationship between x and the expected value of y , given x, might be linear: E(y|x) = 1 + 2 x

Copyright 1996

Lawrence C. Marsh

3.4

f(y|x=480) f(y|x=480)

y|x=480

Figure 3.1a Probability Distribution f(y|x=480) of Food Expenditures if given income x=$480.

Copyright 1996

Lawrence C. Marsh

f(y|x)

f(y|x=480)

f(y|x=800)

3.5

y|x=480

y|x=800

Figure 3.1b Probability Distribution of Food Expenditures if given income x=$480 and x=$800.

Copyright 1996

Lawrence C. Marsh

Average Expenditure E(y|x) E(y|x)=1+2x E(y|x) x 1{ x (income) E(y|x) 2 = x

3.6

Figure 3.2 The Economic Model: a linear relationship between avearage expenditure on food and income.

Homoskedastic Case
f(yt)
ex pe nd itu re

Copyright 1996

Lawrence C. Marsh

3.7

yt

.
x1=480 x2=800
income

xt

Figure 3.3. The probability density function for yt at two levels of household income, x t

Heteroskedastic Case
f(yt)
nd pe x e ur it

Copyright 1996

Lawrence C. Marsh

3.8

.
x1 x2 x3

.
income

xt

Figure 3.3+. The variance of yt increases as household income, x t , increases.

Copyright 1996

Lawrence C. Marsh

3.9

Assumptions of the Simple Linear Regression Model - I


1. The average value of y, given x, is given by the linear regression: E(y) = 1 + 2x 2. For each value of x, the values of y are distributed around their mean with variance: var(y) = 2 3. The values of y are uncorrelated, having zero covariance and thus no linear relationship: cov(yi ,yj) = 0 4. The variable x must take at least two different values, so that x c, where c is a constant.

Copyright 1996

Lawrence C. Marsh

3.10

One more assumption that is often used in practice but is not required for least squares:
5. (optional) The values of y are normally distributed about their mean for each value of x: y ~ N [(1+2x), 2 ]

Copyright 1996

Lawrence C. Marsh

The Error Term


I. Systematic component: This is the mean of y. II. Random component: E(y) = 1 + 2x

3.11

y is a random variable composed of two parts:

e = y - E(y) = y - 1 - 2 x This is called the random error.

Together E(y) and e form the model: y = 1 + 2x + e

y y4

Copyright 1996

Lawrence C. Marsh

3.12

e4 {

E(y) = 1 + 2x

y3 y2

e2

. {

.} e3

y1

} e1 .
x1 x2 x3 x4 x

Figure 3.5 The relationship among y, e and the true regression line.

Copyright 1996

Lawrence C. Marsh

3.13

^ { e4

y4 .

^ y1. y1

.y2 ^ { e2 . ^ y

.} ^3 .e y
3

^ y3

.y ^

^=b +b x y 1 2

e } ^1 . x2 x3 x4 x

x1

^ Figure 3.7a The relationship among y, e and the fitted regression line.

Copyright 1996

Lawrence C. Marsh

3.14

^ y*. 1 ^* e1

. ^* { y e2 . 2

^ y* 2

. ^* { e3 . y

^ y* 3

^* e4

{.

y4 . ^ y* 4

^=b +b x y 1 2 ^*= b + b x y *1 *2

. y
1

x1

x2

x3

x4

Figure 3.7b The sum of squared residuals from any other line will be larger.

f(.)

Copyright 1996

Lawrence C. Marsh

f(e)

f(y)

3.15

1+2x

Figure 3.4 Probability density function for e and y

The Error Term Assumptions 3.16


1. The value of y, for each value of x, is y = 1 + 2x + e 2. The average value of the random error e is: E(e) = 0 3. The variance of the random error e is: var(e) = 2 = var(y) 4. The covariance between any pair of es is: cov(ei ,ej) = cov(yi ,yj) = 0 5. x must take at least two different values so that x c, where c is a constant. 6. e is normally distributed with mean 0, var(e)=2 (optional) e ~ N(0,2)

Copyright 1996

Lawrence C. Marsh

Unobservable Nature of the Error Term

Copyright 1996

Lawrence C. Marsh

3.17

1. Unspecified factors / explanatory variables, not in the model, may be in the error term. 2. Approximation error is in the error term if relationship between y and x is not exactly a perfectly linear relationship. 3. Strictly unpredictable random behavior that may be unique to that observation is in error.

Copyright 1996

Lawrence C. Marsh

Population regression values: y t = 1 + 2x t + e t Population regression line: E(y t|x t) = 1 + 2x t Sample regression values: e y t = b1 + b2x t + ^ t Sample regression line: ^ y t = b1 + b2x t

3.18

Copyright 1996

Lawrence C. Marsh

3.19

y t = 1 + 2x t + e t e t = y t - 1 - 2x t
Minimize error sum of squared deviations:
S(1,2) =

t=1

(y t

- 1 - 2x t )2

(3.3.4)

Minimize w.r.t. 1 and 2:


S(1,2) = S(.)
T t =1

Copyright 1996

Lawrence C. Marsh

3.20

(y t - 1 - 2x t

2 )

(3.3.4)

= - 2 (y t = -2

- 1 - 2x t )

S(.)

x t (y t - 1 - 2x t )
2

Set each of these two derivatives equal to zero and solve these two equations for the two unknowns: 1

Minimize w.r.t. 1 and 2:


S(.) = S(.)

Copyright 1996

Lawrence C. Marsh

3.21

t =1

(y t

- 1 - 2x t )2
S(.)

S(.) < 0 i

.
S(.) = 0 i

.S(.)
i

>0 i

bi

To minimize S(.), you set the two derivatives equal to zero to get: S(.)

Copyright 1996

Lawrence C. Marsh

3.22

= - 2 (y t = -2

- b1 - b2x t )

= 0

S(.)

x t (y t - b1 - b2x t )

= 0

When these two terms are set to zero, 1 and 2 become b1 and b2 because they no longer values that correspond to the minimum of S(.) . represent just any value of 1 and 2 but the special

Copyright 1996

Lawrence C. Marsh

- 2 (y t -2

- b1 - b2x t )

3.23

= 0 = 0

x t (y t - b1 - b2x t )

y t - Tb1 - b2 x t

= 0
2

x t y t - b1 x t - b2 xt
Tb1
b1
+ b2 x t + b2 xt
2

= 0

= =

yt xtyt

xt

Copyright 1996

Lawrence C. Marsh

Tb1
b1

+ b2 x t + b2 xt
2

= =

yt xtyt
x
and

3.24

xt

Solve for b1 and b2 using definitions of

b2 =

T x t yt - x t y t T x t - ( x t)
2 2

b1 = y - b2 x

Copyright 1996

Lawrence C. Marsh

elasticities
y x y/y percentage change in y = = = x y percentage change in x x/x

3.25

Using calculus, we can get the elasticity at a point: y x y x = lim = x 0 x y x y

Copyright 1996

Lawrence C. Marsh

applying elasticities
E(y) = 1 + 2 x E(y) x = 2

3.26

x E(y) x = 2 = E(y) x E(y)

Copyright 1996

Lawrence C. Marsh

estimating elasticities
y x = x y
^ ^ y
t

3.27

x = b2 y

= b1 + b2 x t = 4 + 1.5 x t

x y

= 8 = average number of years of experience = $10 = average wage rate

x = b2 y
^

8 = 1.5 = 1.2 10

Copyright 1996

Lawrence C. Marsh

Prediction
Estimated regression equation: ^ yt = 4 + 1.5 x t

3.28

xt ^ yt
If If

= years of experience = predicted wage rate ^

xt xt

= 2 years, then yt = $7.00 per hour. = 3 years, then yt = $8.50 per hour. ^

Copyright 1996

Lawrence C. Marsh

log-log models
ln(y) = 1 + 2 ln(x) ln(y) = x 1 y y x = 2 2 ln(x) x 1 x x x

3.29

1 y y x x y y x

Copyright 1996

Lawrence C. Marsh

= =

2 2

1 x x x

3.30

elasticity of y with respect to x: = x y y x = 2

You might also like