You are on page 1of 16

1

Simple Linear Regression

Simple linear regression is probably the starting point of constructing econometrics theory. The idea behind regression is to explain the dependence of one
variable yi usually referred to as dependent on another variable xi (independent)
using a simple form
yi = + xi + "i ;
where is the intercept and is the slope, "i is unobserved random variable
called error. Subscript i indicates that this observation belongs to individual i.
Parameters and are xed and unknown to the researcher. Since and E"i
are not separately identied the restriction E"i = 0 is impossed. Then
Eyi =

+ xi ;

but in fact what we have is the conditional mean


E (yi jxi ) =

+ xi ;

which is assumed to be linear in xi : In real applications it is almost never linear.


We just hope that the specied conditional mean is a reasonable approximation.
If we assume that the pair yi ; xi has bivariate normal distribution then yi
conditional on xi is indeed linear. Bivariate normality of yi ; xi seems to be a
reasonable assumption for a number of application, including for example dental
care expenditure and income.
The term regression goes back to the work of Francis Galton who investigated
the relationship between heights of fathers and sons. In general tall fathers tend
to have tall sons and short fathers tend to have short sons. However, very tall
fathers tend to have shorter sons and very short fathers tend to have taller sons.
Galton called this regression towards the mean and that is why we call this
model linear regression.

1.1

Least Squares Estimator

Assume that n independent individuals are observed i = 1; :::; n. Dene the


following quantities
n

x =
Sxx

1X
1X
xi and y =
yi
n i=1
n i=1

n
X

(xi

x) and Syy =

i=1

Sxy

n
X

n
X
i=1

(xi

x) (yi

y) :

i=1

Also dene the residual sum of squares as


RSS =

n
X

(yi

i=1

( + xi ))

(yi

y)

The least squares estimates of parameters


and b that minimize RSS.
n
X
i=1

Theorem

n
X
min
(xi
a

n
X

b + b xi

yi

are dened as such values b

and

yi

b xi

(xi

x) :

i=1

a) =

i=1

n
X

i=1

We leave it as an exercise to prove it. Then the minimizing value b is


n

b=

Then
n
X

1X
yi
n i=1

i=1

n
X

b + b xi

yi

b xi = y

(yi

b x:

b (xi

y)

i=1

= Syy

x)

2 b Sxy + b Sxx :

Taking the derivative of this and setting it to zero leads to the solution
b = Sxy ;
Sxx

2
which is a minimum since coe cient b is positive. The least squares estimates
of and are

1.2

OLS

b OLS

Sxy
=
Sxx

n
P

(xi

x) (yi

y)

i=1
n
P

;
(xi

x)

i=1

= y

OLS x:

Best Linear Unbiased Estimator

When there is no intercept in the regression the estimator of the slope coe cient
collapses to
n
P
xi yi
i=1
b
:
OLS = P
n
x2i
i=1

Next we show that the least square estimator is best linear unbiased estimator
(BLUE). This estimator is linear because it can be presented in the form
b=

n
X
i=1

d i yi :

It is also unbiased

n
P

Eb =

i=1
n
P

i=1

Since

Eb =

n
P

xi
i=1
n
P

xi Eyi
=
x2i

n
X

i=1

di Eyi =

i=1

xi
= :

x2i

n
X

di xi

i=1

the unbiasedness of the estimator means that


n
X

di xi = 1:

i=1

Assume that yi are independent with common variance


V ar b =

n
X

d2i

. Then

i=1

and to nd di so that the variance of estimator b is minimum we have to


n
n
P
P
minimize
d2i 2 subject to the constraint
di xi = 1: Minimize
i=1

i=1

n
X

n
X

d2i

i=1

di xi

1 :

i=1

The derivative with respect to di gives


2di

xi

di

0
xi
2

Multiply by xi and sum over i

n
X

di xi

di xi

i=1

Then since

n
P

i=1

x2i
2
n
X

2 i=1

di xi = 1
2
= P
:
n
x2i
i=1

Then

xi
di = P
n
x2i
i=1

x2i

which gives the least squares estimator

1.3

OLS

n
P

xi yi
= i=1
:
n
P
x2i
i=1

Conditional Normal Model

We made very few assumptions to derive the least squares estimator. Specically, we specied the conditional mean of yi , its variance and assumed statistical
independence of the individual observations. As a results of these assumptions
we were able to derive the least squares estimator and show its unbiasedness
and e ciency.
Now let us make a much stronger assumption that
yi

+ xi ;

; i = 1; :::; n:

Alternatively it can be dened as


yi
"i

iid

n 0;

+ xi + "i ;

; i =

1; :::; n:

Then the joint pdf is given by


f yj ; ;

n
Y

i=1
n
Y

f yi j ; ;

1
p
=
2
i=1
=

exp

1
n=2

(2 )

"

(yi

xi )
2

n
P

6
exp 6
4

#
2

(yi

xi )

i=1

7
7:
5

To nd the maximum likelihood estimates we nd the log-likelihood function

log f yj ; ;

n
log
2

n
log (2 )
2

From this we see that for any xed value


imized where the lesat square function
n
X

xi )

i=1

(yi

xi )

i=1

the log-likelihood function is max-

(yi

n
P

is minimized and therefore the ML estimates of parameters


equal the OLS estimates

ML

bM L

Sxy
=
Sxx

n
P

x) (yi

(xi

should

y)

i=1
n
P

and

;
2

(xi

x)

i=1

= y

M L x:

Substituting these in the log-likelihood function


n
log (2 )
2

n
log
2

we maximize it with respect to


zero
n 1
+
2 2

n
P

yi

i=1

b xi

. The derivative with respect to

n
P

yi

i=1

which gives the solution

1X
yi
n i=1

is set to

=0

b xi

b xi

b2 =

which is the RSS evaluated at the least squares line divided by the sample size.
Dene the residusl from the regression to be
b
"i = yi

V arb2 =

b xi :

Since b and b are unbiased estimates Eb


"i = 0: It can be shown that
Thus an unbiased estimator of

would be
n
sb =
b2
n 2
2

Theorem
The sampling distributions of the estimators b , b for the conditional normal
model are
!
n
2 X
2
x ;
b
n
;
nSxx i=1 i
cov b ; b

Sxx
5

Sxx

Also b ; b and sb2 are independent and


2) sb2

(n

2
n 2

Proof
Since b and b are linear combinations of normally distributed yi they must
be normal. Then
E b OLS
V ar b

n
P

(xi

x) E (yi

y)

i=1
n
P

=
2

(xi

x)

n
P

n
P

i=1

n
P

x)

=
(xi

x)

i=1
2

(xi

x) V ar (yi

y)

i=1
n
P

x) (xi

(xi

i=1

2
2

(xi

x)

Sxx

i=1

It can be shown that estimator b OLS can be expressed as


b OLS =

where

ci =

1
n

n
X

(xi

Then it could be shown that E b OLS =

ci yi

i=1

x) x
:
Sxx
and V arb OLS =

nSxx

n
P

i=1

x2i and also

2
that cov b ; b = Sxxx ; which I leave as an exersize.
The second part of the theorem is somewhat more di cult to prove because
it involves considerable algebraical manipulations. In fact, it is much easier to
prove this using matrix algebra which we will do later. Here I just give a sketch
of the proof.
s2
Random variable (n 2)b
is a function of the estimated residuals b
" i = yi b
2
b xi : The regression itself could be thought of as a decomposition of the observed
values yi into the predicted value ybi = b + b xi and the residual b
"i = yi ybi :
They are uncorrelated because

n
X
i=1

where

n
P

b
"i = 0 and

i=1

n
P

i=1

ybib
"i = b

n
n
X
X
b
"i + b xib
"i = 0;
i=1

i=1

xib
"i = 0 because it is consistent the F.O.C. of the ML

estimator taken with respect to parameters b and b . Zero correlation implies


6

independence under normality. Moreover,


n
X
"2

i
2

2
n

i=1

by construction. And
n
X

"2i

i=1

n
X

(yi

xi ) =

i=1

n
X

ybi + ybi

(yi

i=1

n
n
X
X
=
b
"2i +
b
i=1

i=1

2
n 2

is a decomposition of two terms


this is not a formal proof.

and

b
2
2

x2i

sb

and

n
P

i=1

distributed respectively. However,

Statistical inference about parameters


following Students t-distributions
s

xi )

can be made based on the

tn

2;

tn

2:

x2i = (nSxx )
b
p
sb= Sxx

In fact, the joint distribution of these two t-statistics is a bivariate Students


t-distribution. Practitioner usually concentrate their interest on statistical inference about the slope parameter. If = 0 then there is no linear relationship
between yi and xi . The test of the null hypothesis
H0 :

=0

H1 :

6= 0;

against

can be based on

b
p
sb= Sxx

or equivalently on

b2

b2

sb2 =Sxx

F1;n

sb2 =Sxx

In fact,
=

tn

2:

2
2
2
Sxy
=Sxx
Sxy
=Sxx
=
:
sb2 =Sxx
RSS= (n 2)

2
The quantity in the numerator Sxy
=Sxx has a special name: Regression sum of
squares. In the identity
n
X

n
X

y) =

(yi

n
X

y) +

(b
yi

i=1

i=1

ybi ) ;

(yi

i=1

Total sum of squares=Regression sum of square+Residual sum of square,


where ybi = b + b xi : It can also be shown that
n
X

2
y) = Sxy
=Sxx :

(b
yi

i=1

This is left as an exercise.


A measure of goodness of t usually used for linear regression is the coe cient of determination
n
P

Regression sum of square


= i=1
r2 =
n
P
Total sum of squares

(b
yi

y)

=
2

(yi

y)

2
Sxy
:
Sxx Syy

i=1

It is 0 6 r2 6 1 and it measures the proportion of variation in the dependent


variable explained by the regression line.

1.4

Consistency

The least squares estimator is consistent.

n
P

(xi

x) (yi

y)

i=1

n
P

(xi

x) [(xi

x)

i=1
n
P

(xi

+ (ui

u)]

n
P

(xi

x)

i=1
2

x)

i=1

x)

i=1

(xi

n
P

n
P

(xi

x) (ui

i=1
n
P

(xi

1
n

u)
=

x)

i=1

n
P

(xi
i=1
n
P
1
(xi
n
i=1

x) ui
:
2

x)

The rst term on this sum is


which is a constant. The second term is a
ratio of a weighted average of random variables ui and a non-stochastic term
n
P
2
1
(xi x) : We assume that as n ! 1 the numerical series is convergent
n
1
n

i=1
n
P
i=1

(xi

x)

! Qxx < 1: We can think of

mean of random variables (xi

x) ui such that

E (xi

x) ui

V ar (xi

x) ui

(xi

1
n

n
P

(xi

i=1

x) < 1:

x) ui as the sampling

An extension of the law of large numbers discussed before applied to independently but not identically distributed sequence of random variables can be
utilized here to establish that
" n
#
n
1X
1X
p
(xi x) ui ! E
(xi x) ui = 0:
n i=1
n i=1
Therefore,

b=

1.5

n
P

1
n

(xi
i=1
n
P
1
(xi
n
i=1

x) ui

x)

Asymptotic Normality

Given the condition

1
n

n
P

(xi

x)

! Qxx < 1 the series

i=1

1
n

n
P

(xi

x) ui

i=1

converges to a degenerate random variable with variance zero. Indeed


" n
#
n
1X
1X
2
(xi x) ui
(xi x) 2
V ar
=
n i=1
n2 i=1
" n
#
2
1X
2
(xi x)
! 0 as n ! 1
=
n n i=1
since the term in the brackets converges to the nite Qxx . Therefore, b
converges to a random variable the distribution of which is all probability mass
of one on the single value of zero. The rate of convergence of b
to zero is
very fast.
However,
this
convergence
could
be
slowed
down
by
multiplying
it
p
with n: Then
n
P
p1
(xi x) ui
n
p
i=1
b
:
n
=
n
P
2
1
(x
x)
i
n
i=1

Denote

such that

1 X
zn = p
(xi
n i=1
Ezn
V arzn

=
=

x) ui

0
n
2X

i=1

(xi

x) < 1:

Table 1: Summary statistics.


Lntdcexp
Lnincome

logarithm of total dental expenditure


logarithm of income

5.108
3.397

1.183
0.910

Then a central limit theorem could be applied to such an i.n.i.d. series of random
variables such that
zn Ezn d
p
! N (0; 1)
V arzn
or
d
zn ! N 0; 2 Qxx :
Then
p

n b

p1
n

=
1
n

n
P

i=1
n
P

(xi

x) ui

(xi

! N 0;

x)

Qxx1 :

i=1

Dental Expenditure

We investigate the eect of income on the total dental expenditure. The data
set is derived from the Medical Expenditure Panel Survey (MEPS). MEPS
is a nationally representative survey of health care, including dental, expenditure, sources of payment and insurance coverage for the US civilian noninstitutionalized population. We use data from the 1996, 1997, 1998, 1999 and
2000 surveys. The sampling scheme of the MEPS data is a two-year overlapping
panel, i.e., in each calendar year after the rst survey year, one sample of persons is in its second year of responses while another sample of persons is in its
rst year of responses. The sample is restricted to the U.S. population between
the ages of 25 and 64 years and those who are employed. Further we take only
those observations whose total dental expenditure and income are positive. To
reduce heterogeneity coming from the fact that the eect of income should be
structurally dierent for those with and without dental insurance we restrict our
sample to only those without dental insurance. The total number of observation
in the data set is 2737. We use logarithm retransformation of income and dental
expenditure to have more symmetric distributions. Tables 1 gives a summary
statistics.
Here is the plot of the data indicating that Lntdcexp and Lnincome are potentially bivaraite normally distributed. When two random variables (yi ; xi )
BN x ; y ; 2x ; 2y ; the conditional distribution of yi given xi is
E (yi jxi ) =

y
x

(x

x) :

This means that the bivariate normal model implies that yi is a linear function
of xi : The estimated regression isand the null hypothesis
10

10
8
6
lntdcexp
4
2
0
0

lnincome

Source

SS

df

MS

Number of obs =
F(

1,

2737

2735) =

32.37

Model

44.8104909

44.8104909

Prob > F

0.0000

Residual

3786.1013

2735

1.38431492

R-squared

0.0117

Total

lntdcexp

3830.91179

Coef.

2736

1.40018706

Std. Err.

Adj R-squared =

0.0113

Root MSE

1.1766

P>|t|

[95% Conf. Interval]

lnincome

.140637

.0247188

5.69

0.000

.0921676

.1891064

_cons

4.630147

.0869183

53.27

0.000

4.459715

4.800579

11

10
8
6
4
2
0
0

lnincome
lntdcexp

Fitted values

H0 :

=0

is rejected with the t-ratio of 5.69 and the p-value smaller that 0.1%. The
plotted regression line isAs you can see from the regression results the linear
model is a poor t for the data. Just roughly 1% of variation in expenditure is
explained by income. More explanatory variables are needed to improve the t.

12

Problems
1. Prove that

n
X
min
(xi
a

i=1

V arb2M L =
b

3. Shown that estimator b OLS = y

ci =

x) :

n
X

can be expressed as

ci yi

i=1

(xi

1
n

OLS x:

b OLS =

where

(xi

i=1

2. Show that

Verify that E b OLS =

n
X

a) =

x) x

Sxx

and
2

V arb OLS =

n
X

nSxx i=1

x2i

and verify that

Solution:
b=y

Note that

cov b ; b =
bx =

n
X
i=1

1
yi
n

(xi

x) yi
Sxx

n
X

Sxx

ci

ci xi

i=1

13

n
X
i=1

i=1

n
X

1
n

(xi

x) x
Sxx

yi

which means that


Eb

V arb

n
X

c2i

i=1

n
X

n 6
X
61
6 2+
4n
i=1

(xi

n
P
2

(xi x) x2
n
P
2
(xi x)
i=1
3

61
6 +
n
4n
P

x) x 7
7
25
(xi x)
i=1
3

61
6
4n

i=1

32

7
7

27
5

n
2 X
7
7=
x2 :
nSxx i=1 i
25
x)

x2
(xi

i=1

Finally to verify the last fact we rst notice that

OLS

n
P

(xi

x) (yi

y)

i=1

n
P

x)

(xi

i=1
n
P

(xi

n
P

x) yi

i=1

(xi

x) y

i=1
n
P

(xi

x)

i=1
n
P

i=1
n
P

(xi

x) yi
:

(xi

x)

i=1

Then
cov b ; b

n
X

B
= cov B
@
=

i=1

n
X
i=1

1
n

1
n

(xi

x) x
Sxx

(xi

x) x
Sxx

n
P

yi ; i=1

(xi
Sxx

(xi x)
=
Sxx

4. Show that
n
X
i=1

(yi

y) =

n
X

(b
yi

i=1

14

y) +

n
X
i=1

(yi

x)

ybi ) :

Sxx

C
yi C
A

Solution: Need to show that the cross product is zero


n
X

(b
yi

ybi ) =

y) (yi

i=1

Substitute b = y
=

n
X
i=1

b x to get

n h
X
i=1
n
X

b (xi
(yi

b + b xi

ih
x) (yi

y) (xi

x)

i=1

because of the denition of b :

5. Show that

n
X

y)
b2

b (xi

n
X

b + b xi

yi

(xi

i
x)
2

x) = 0

i=1

2
y) = Sxy
=Sxx

(b
yi

i=1

Solution:

n
X

(b
yi

i=1

2
y) = b
2

n
X

(xi

2
x) = Sxy
=Sxx :

i=1

6. Run four simple regressions for the data sets provided in the table below.
Report your results and plot the data sets on four dierent graphs. What
conclusion
can
you
make
from
this?
Anscombes
quartet:
x1
y1
x2
y2
x3
y3
x4
y4
10.0 8.04
10.0 9.14 10.0 7.46
8.0
6.58
8.0
6.95
8.0
8.14 8.0
6.77
8.0
5.76
13.0 7.58
13.0 8.74 13.0 12.74 8.0
7.71
9.0
8.81
9.0
8.77 9.0
7.11
8.0
8.84
11.0 8.33
11.0 9.26 11.0 7.81
8.0
8.47
.
14.0 9.96
14.0 8.10 14.0 8.84
8.0
7.04
6.0
7.24
6.0
6.13 6.0
6.08
8.0
5.25
4.0
4.26
4.0
3.10 4.0
5.39
19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15
8.0
5.56
7.0
4.82
7.0
7.26 7.0
6.42
8.0
7.91
5.0
5.68
5.0
4.74 5.0
5.73
8.0
6.89
7. Let x and x be independent random variables with means 1 , 2 and
variances 21 ; 22 : Determine the correlation coe cient of x and z = x y
in terms of 1 , 2 , 21 ; 22 .
8. Suppose we estimate the model:
yi =
where ui

N 0;

; i = 1; :::; N:
15

+ ui ;

(a) Show that the OLS estimator of b = y:

(b) Directly obtain the variance of y.


9. Consider a model

y = ( + x)e
where y and x are scalar observables, e is unobservable. Let E[ejx] = 1
and V ar[ejx] = 1. How would you estimate and by OLS? How would
you construct the standard errors?
10. Let

zn ! N 0;

n
P
1
(xi
n
n !1 i=1

where Qxx = lim

(xi

Qxx ;

x) : Then show that

zn

n
P
1

d
2

x)

i=1

16

! N 0;

Qxx1 :

You might also like